Finishing the revisions to the Subversion chapter

[swc-version-control-svn.git] / svn.html
diff --git a/svn.html b/svn.html

index 21f82f33ef6f038eb45a7f9a3b0c6decc1c924d0..1dbfb02f6485f55dc815d0e04e987224038a25ab 100644 (file)
--- a/svn.html
+++ b/svn.html
@@ -58,40 +58,118 @@
  
  </ol>
  
-<div class="box">
-  <h3>Nothing's Perfekt</h3>
-
-  <p>
-    Version control systems do have one important shortcoming.
-    While it is easy for them to find, display, and merge differences in text files,
-    images, MP3s, PDFs, or Microsoft Word or Excel files aren't stored as text&mdash;they
-    use specialized binary data formats.
-    Most version control systems don't know how to deal with these formats,
-    so all they can say is, "These files differ."
-    Reconciling those differences will probably require use of an auxiliary tool,
-    such as an audio editor
-    or Microsoft Word's "Compare and Merge" utility.
-  </p>
-</div>
-
  <p>
    The rest of this chapter will explore how to use
    a popular open source version control system called Subversion.
+  It does not have all the features of some newer systems,
+  such as <a href="git.html">Git</a>,
+  but it is still widely used,
+  and is simpler to pick up than those more advanced alternatives.
+  No matter which system you use,
+  the most important thing to learn is not the details of their more obscure commands,
+  but the workflow that they encourage.
  </p>
  
  <div class="guide">
    <h2>For Instructors</h2>
  
-  <p class="fixme">explain</p>
+  <p>
+    Version control is the most important practical skill we introduce.
+    As the last paragraph of the introduction above says,
+    the workflow matters more than the ins and outs of any particular tool.
+    By the end of 90 minutes,
+    the instructor should be able to get learners to chant,
+    "Update, edit, merge, commit," in unison,
+    and have them understand what those terms mean
+    and why that's a good way to structure their working day.
+  </p>
+
+  <p>
+    Provided there aren't network problems,
+    this entire lesson can be covered in <span class="duration">90 minutes</span>.
+    The example at the end
+    showing how to use Subversion keywords to track provenance
+    is the "ah ha!" moment for many learners.
+    If time is short,
+    skip the material on recovering old versions of files
+    in order to get to this section instead.
+    (The fact that provenance is harder in Git,
+    both mechanically and conceptually,
+    is one reason to keep teaching Subversion.)
+  </p>
  
    <div class="prereq">
      <h3>Prerequisites</h3>
-    <p class="fixme">prereq</p>
+    <p>
+      Basic shell concepts and skills
+      (<code>ls</code>, <code>cd</code>, <code>mkdir</code>,
+      editing files);
+      basic shell scripting
+      (for the discussion of <a href="#s:provenance">provenance</a>).
+    </p>
    </div>
  
    <div class="notes">
      <h3>Teaching Notes</h3>
      <ul>
+      <li>
+        Make sure the network is working <em>before</em> starting this lesson.
+      </li>
+      <li>
+        Give learners a ten-minute overview of what version control does for them
+        before diving into the watch-and-do practicals.
+        Most of them will have tried to co-author papers by emailing files back and forth,
+        or will have biked into the office
+        only to realize that the USB key with last night's work
+        is still on the kitchen table.
+        Instructors can also make jokes about directories with names like
+        "final version",
+        "final version revised",
+        "final version with reviewer three's corrections",
+        "really final version",
+        and,
+        "come on this really has to be the last version"
+        to motivate version control as a better way to collaborate
+        and as a better way to back work up.
+      </li>
+      <li>
+        Version control is typically taught after the shell,
+        so collect learners' names during that session
+        and create a repository for them to share
+        with their names as both their IDs and their passwords.
+        The easiest way to create the repository is to use
+        a server managed by an ISP such as Dreamhost,
+        or on SourceForge, Google Code, or some other "forge" site,
+        all of which provide web interfaces for repository creation and management.
+        If your learners are advanced enough to be using SSH,
+        you can instead create it on any server they can access,
+        and connect with the <code>svn+ssh</code> protocol instead of HTTPS.
+      </li>
+      <li>
+        Be very clear what files learners are to edit
+        and what user IDs they are to use
+        when giving instructions.
+        It is common for them to edit the instructor's biography,
+        or to use the instructor's user ID and password when committing.
+        Be equally clear <em>when</em> they are to edit things:
+        it's also common for someone to edit the file the instructor is editing
+        and commit changes while the instructor is explaining what's going on,
+        so that a conflict occurs when the instructor comes to commit the file.
+      </li>
+      <li>
+        Learners could do most exercises with repositories on their own machines,
+        but it's hard for them to see how version control helps collaboration
+        unless they're sharing a repository with other learners.
+        In particular,
+        showing learners who changed what using <code>svn blame</code>
+        is only compelling if a file has been edited by at least two people.
+      </li>
+      <li>
+        If some learners are using Windows,
+        there will inevitably be issues merging files with different line endings.
+        <code>svn diff -x -w</code> is supposed to suppress differences in whitespace,
+        but we have found that it doesn't always work as advertised.
+      </li>
      </ul>
    </div>
  
@@ -311,7 +389,7 @@ cydonia.txt  mons-olympus.txt</span>
  
  <pre>
  $ <span class="in">pwd</span>
-<span class="out">/home/vlad/explore</span>
+<span class="out">/home/dracula/explore</span>
  $ <span class="in">ls -a</span>
  <span class="out">.    ..    .svn    earth    jupiter    mars</span>
  $ <span class="in">ls -F .svn</span>
@@ -478,6 +556,30 @@ Committed revision 7.</span>
      <figcaption>Figure 8: Updated Repository</figcaption>
    </figure>
  
+  <div class="box">
+    <h3>When <em>Not</em> to Use Version Control</h3>
+
+    <p>
+      Despite the rapidly decreasing cost of storage,
+      it is still possible to run out of disk space.
+      In some labs,
+      people can easy go through 2 TB/month if they're not careful.
+      Since version control tools usually store revisions in terms of lines,
+      with binary data files,
+      they end up essentially storing every revision separately.
+      This isn't that bad
+      (it's what we'd be doing anyway),
+      but it means version control isn't doing what it likes to do,
+      and the repository can get very large very quickly.
+      Another concern is that if very old data will no longer be used,
+      it can be nice to archive or delete old data files.
+      This is not possible if our data is version controlled:
+      information can only be added to a repository,
+      so it can only ever increase in size.
+    </p>
+
+  </div>
+
    <p id="a:define-head">
      Back in his cubicle,
      Wolfman uses <code>svn update</code> to update his working copy.
@@ -683,6 +785,22 @@ $ <span class="in">svn diff -r HEAD</span>
  
    </div>
  
+  <div class="box">
+    <h3>Nothing's Perfekt</h3>
+
+    <p>
+      Version control systems do have one important shortcoming.
+      While it is easy for them to find, display, and merge differences in text files,
+      images, MP3s, PDFs, or Microsoft Word or Excel files aren't stored as text&mdash;they
+      use specialized binary data formats.
+      Most version control systems don't know how to deal with these formats,
+      so all they can say is, "These files differ."
+      Reconciling those differences will probably require use of an auxiliary tool,
+      such as an audio editor
+      or Microsoft Word's "Compare and Merge" utility.
+    </p>
+  </div>
+
    <div class="box">
      <h3>Diffing Other Files</h3>
  
@@ -777,6 +895,35 @@ $ <span class="in">diff left.txt right.txt</span>
      and if necessary undo later on.
    </p>
  
+  <div class="box">
+    <h3>Who Did What?</h3>
+
+    <p>
+      One other very useful command is <code>svn blame</code>,
+      which shows when each line in the file was last changed
+      and by whom:
+    </p>
+
+<pre>
+$ <span class="in">svn blame moons.txt</span>
+<span class="out">    14    dracula Name            Orbital Radius  Orbital Period  Mass            Radius
+    14    dracula                 (10**3 km)      (days)          (10**20 kg)     (km)
+    14    dracula Amalthea        181.4           0.498179        0.075           131 x 73 x 67
+     9    mummy   Io              421.6           1.769138        893.2           1821.6
+     9    mummy   Europa          670.9           3.551181        480.0           1560.8
+     9    mummy   Ganymede        1070.4          7.154553        1481.9          2631.2
+    14    dracula Callisto        1882.7          16.689018       1075.9          2410.3
+    14    dracula Himalia         11460           250.5662        0.095           85.0
+    14    dracula Elara           11740           259.6528        0.008           40.0</span>
+</pre>
+
+    <p>
+      If you are ever wondering who to talk to about a change,
+      or why it was made,
+      <code>svn blame</code> is a good place to start.
+    </p>
+  </div>
+
    <div class="keypoints">
      <h3>Summary</h3>
      <ul>
@@ -1369,6 +1516,33 @@ U  moons.txt</span>
      if he can ever convince the Mummy that numbers should have commas.
    </p>
  
+  <div class="box">
+    <h3>Another Way to Do It</h3>
+
+    <p>
+      Another way to recover a particular version of a particular file
+      is to use the <code>svn copy</code> command.
+      If the URL of our repository is
+      <code>https://universal.software-carpentry.org/explore</code>,
+      then the command:
+    </p>
+
+<pre>
+$ <span class="in">svn copy https://universal.software-carpentry.org/explore/mission.txt@120 ./mission.txt</span>
+</pre>
+
+    <p class="continue">
+      copies the file <code>mission.txt</code> as it was in revision 120
+      into our working directory
+      (overwriting whatever <code>mission.txt</code> file we currently have,
+      if any).
+      What's more,
+      using <code>svn copy</code> brings along the file's history as well,
+      so that future <code>svn log</code> operations will show
+      how <code>mission.txt</code> was resurrected.
+    </p>
+  </div>
+
    <p>
      Merging can be used to recover older revisions of files,
      not just the most recent,
@@ -1457,433 +1631,574 @@ U  moons.txt</span>
        <li>Old versions of files can be recovered by merging their old state with their current state.</li>
        <li>Recovering an old version of a file does not erase the intervening changes.</li>
        <li>Use branches to support parallel independent development.</li>
-      <li><code>svn merge</code> merges two revisions of a file.</li>
        <li><code>svn revert</code> undoes local changes to files.</li>
+      <li><code>svn merge</code> merges two revisions of a file.</li>
      </ul>
    </div>
  
    <div class="challenges">
      <h3>Challenges</h3>
  
-    <p class="fixme">write some</p>
+    <ol>
+      <li>
+        Explain what the command:
+<pre>
+svn diff -r 240:261 fish.dat
+</pre>
+        does, and when you might want to run it.
+      </li>
+
+      <li>
+        Suppose that a file called <code>mission.txt</code>
+        existed in revision 90 of a repository,
+        but had been deleted in revision 91.
+        What two commands could we use to recover it?
+      </li>
+
+    </ol>
    </div>
  
  </section>
  
-    <section id="s:setup">
+<section id="s:setup">
+  <h2>Setting up a Repository</h2>
  
-      <h2>Setting up a Repository</h2>
+  <div class="understand">
+    <h3>Learning Objectives</h3>
+    <ul>
+      <li>How to create a repository.</li>
+    </ul>
+  </div>
  
-      <div class="understand" id="u:setup">
-        <h3>Understand:</h3>
-        <ul>
-          <li>How to create a repository.</li>
-        </ul>
-      </div>
+  <p>
+    It is finally time to see how to create a repository.
+    As a quick recap,
+    we will keep the master copy of our work in a repository
+    on a server that we can access from other machines on the internet.
+    That master copy consists of files and directories that no-one ever edits directly.
+    Instead, a copy of Subversion running on that machine
+    manages updates for us and watches for conflicts.
+    Our working copy is a mirror image of the master sitting on our computer.
+    When our Subversion client needs to communicate with the master,
+    it exchanges data with the copy of Subversion running on the server.
+  </p>
+
+  <figure id="f:repo_four_things">
+    <img src="svn/repo_four_things.png" alt="What's Needed for a Repository" />
+    <figcaption>Figure 15: What's Needed for a Repository</figcaption>
+  </figure>
  
-      <p>
-        It is finally time to see how to create a repository.
-        As a quick recap,
-        we will keep the master copy of our work in a repository
-        on a server that we can access from other machines on the internet.
-        That master copy consists of files and directories that no-one ever edits directly.
-        Instead, a copy of Subversion running on that machine
-        manages updates for us and watches for conflicts.
-        Our working copy is a mirror image of the master sitting on our computer.
-        When our Subversion client needs to communicate with the master,
-        it exchanges data with the copy of Subversion running on the server.
-      </p>
+  <p>
+    To make this to work, we need four things
+    (<a href="#f:repo_four_things">Figure 15</a>):
+  </p>
  
-      <figure id="f:repo_four_things">
-        <img src="svn/repo_four_things.png" alt="What's Needed for a Repository" />
-      </figure>
+  <ol>
  
-      <p>
-        To make this to work, we need four things
-        (<a href="#f:repo_four_things">Figure XXX</a>):
-      </p>
+    <li>
+      The repository itself.
+      It's not enough to create an empty directory and start filling it with files:
+      Subversion needs to create a lot of other structure
+      in order to keep track of old revisions, who made what changes, and so on.
+    </li>
  
-      <ol>
-
-        <li>
-          The repository itself.
-          It's not enough to create an empty directory and start filling it with files:
-          Subversion needs to create a lot of other structure
-          in order to keep track of old revisions, who made what changes, and so on.
-        </li>
-
-        <li>
-          The full URL of the repository.
-          This includes the URL of the server
-          and the path to the repository on that machine.
-          (The second part is needed because a single server can,
-          and usually will,
-          host many repositories.)
-        </li>
-
-        <li>
-          Permission to read or write the master copy.
-          Many open source projects give the whole world permission to read from their repository,
-          but very few allow strangers to write to it:
-          there are just too many possibilities for abuse.
-          Somehow, we have to set up a password or something like it
-          so that users can prove who they are.
-        </li>
-
-        <li>
-          A working copy of the repository on our computer.
-          Once the first three things are in place,
-          this just means running the <code>checkout</code> command.
-        </li>
-
-      </ol>
+    <li>
+      The full URL of the repository.
+      This includes the URL of the server
+      and the path to the repository on that machine.
+      (The second part is needed because a single server can,
+      and usually will,
+      host many repositories.)
+    </li>
  
-      <p>
-        To keep things simple,
-        we will start by creating a repository on the machine that we're working on.
-        This won't let us share our work with other people,
-        but it <em>will</em> allow us to save the history of our work as we go along.
-      </p>
+    <li>
+      Permission to read or write the master copy.
+      Many open source projects give the whole world permission to read from their repository,
+      but very few allow strangers to write to it:
+      there are just too many possibilities for abuse.
+      Somehow, we have to set up a password or something like it
+      so that users can prove who they are.
+    </li>
  
-      <p>
-        The command to create a repository is <code>svnadmin create</code>,
-        followed by the path to the repository.
-        If we want to create a repository called <code>lair_repo</code>
-        directly under our home directory,
-        we just <code>cd</code> to get home
-        and run <code>svnadmin create lair_repo</code>.
-        This command creates a directory called <code>lair_repo</code> to hold our repository,
-        and fills it with various files that Subversion uses
-        to keep track of the project's history:
-      </p>
+    <li>
+      A working copy of the repository on our computer.
+      Once the first three things are in place,
+      this just means running the <code>checkout</code> command.
+    </li>
+
+  </ol>
+
+  <p>
+    To keep things simple,
+    we will start by creating a repository on the machine that we're working on.
+    This won't let us share our work with other people,
+    but it <em>will</em> allow us to save the history of our work as we go along.
+  </p>
+
+  <p>
+    The command to create a repository is <code>svnadmin create</code>,
+    followed by the path to the repository.
+    If we want to create a repository called <code>missions_repo</code>
+    directly under our home directory,
+    we just <code>cd</code> to get home
+    and run <code>svnadmin create missions_repo</code>.
+    This command creates a directory called <code>missions_repo</code> to hold our repository,
+    and fills it with various files that Subversion uses
+    to keep track of the project's history:
+  </p>
  
  <pre>
  $ <span class="in">cd</span>
-$ <span class="in">svnadmin create lair_repo</span>
-$ <span class="in">ls -F lair_repo</span>
+$ <span class="in">svnadmin create missions_repo</span>
+$ <span class="in">ls -F missions_repo</span>
  <span class="out">README.txt    conf/    db/    format    hooks/    locks/</span>
  </pre>
  
-      <p class="continue">
-        We should <em>never</em> edit anything in this repository directly.
-        Doing so probably won't shred our sanity and leave us gibbering in mindless horror,
-        but it will almost certainly make the repository unusable.
-      </p>
+  <p class="continue">
+    We should <em>never</em> edit any of this directly,
+    since it will almost certainly make the repository unusable.
+    Instead,
+    we should use <code>svn checkout</code>
+    to get a working copy of this repository.
+    If our home directory is <code>/users/mummy</code>,
+    then the full path to the repository we just created is <code>/users/mummy/missions_repo</code>,
+    so we run <code>svn checkout file:///users/mummy/missions missions_working</code>.
+  </p>
  
-      <p>
-        To get a working copy of this repository,
-        we use Subversion's <code>checkout</code> command.
-        If our home directory is <code>/users/mummy</code>,
-        then the full path to the repository we just created is <code>/users/mummy/lair_repo</code>,
-        so we run <code>svn checkout file:///users/mummy/lair lair_working</code>.
-      </p>
+  <p>
+    Working backward,
+    the second argument,
+    <code>missions_working</code>,
+    specifies where the working copy is to be put.
+    The first argument is the URL of our repository,
+    and it has two parts.
+    <code>/users/mummy/missions_repo</code> is the path to repository directory.
+    <code>file://</code> specifies the <a href="glossary.html#protocol">protocol</a>
+    that Subversion will use to communicate with the repository&mdash;in this case,
+    it says that the repository is part of the local machine's filesystem.
+    (Notice that the protocol ends in two slashes,
+    while the absolute path to the repository starts with a slash,
+    making three in total.
+    A very common mistake is to type only two, since that's what web URLs normally have.)
+  </p>
  
-      <p>
-        Working backward,
-        the second argument,
-        <code>lair_working</code>,
-        specifies where the working copy is to be put.
-        The first argument is the URL of our repository,
-        and it has two parts.
-        <code>/users/mummy/lair_repo</code> is the path to repository directory.
-        <code>file://</code> specifies the <a href="glossary.html#protocol">protocol</a>
-        that Subversion will use to communicate with the repository&mdash;in this case,
-        it says that the repository is part of the local machine's filesystem.
-        Notice that the protocol ends in two slashes,
-        while the absolute path to the repository starts with a slash,
-        making three in total.
-        A very common mistake is to type only two, since that's what web URLs normally have.
-      </p>
+  <p>
+    When we're doing a checkout,
+    it is <em>very</em> important that we provide the second argument,
+    which specifies the name of the directory we want the working copy to be put in.
+    Without it,
+    Subversion will try to use the name of the repository,
+    <code>missions_repo</code>,
+    as the name of the working copy.
+    Since we're in the directory that contains the repository,
+    this means that Subversion will try to overwrite the repository with a working copy.
+    Again,
+    there isn't much risk of our sanity being torn to shreds,
+    but this could ruin our repository.
+  </p>
  
-      <p>
-        When we're doing a checkout,
-        it is <em>very</em> important that we provide the second argument,
-        which specifies the name of the directory we want the working copy to be put in.
-        Without it,
-        Subversion will try to use the name of the repository,
-        <code>lair_repo</code>,
-        as the name of the working copy.
-        Since we're in the directory that contains the repository,
-        this means that Subversion will try to overwrite the repository with a working copy.
-        Again,
-        there isn't much risk of our sanity being torn to shreds,
-        but this could ruin our repository.
-      </p>
+  <p>
+    To avoid this problem,
+    most people create a sub-directory in their account called something like <code>repos</code>,
+    and then create their repositories in that.
+    For example,
+    we could create our repository in <code>/users/mummy/repos/missions</code>,
+    then check out a working copy as <code>/users/mummy/missions</code>.
+    This practice makes both names easier to read.
+  </p>
  
-      <p>
-        To avoid this problem,
-        most people create a sub-directory in their account called something like <code>repos</code>,
-        and then create their repositories in that.
-        For example,
-        we could create our repository in <code>/users/mummy/repos/lair</code>,
-        then check out a working copy as <code>/users/mummy/lair</code>.
-        This practice makes both names easier to read.
-      </p>
+  <p>
+    The obvious next step is to put our repository on a server,
+    rather than on our personal machine.
+    In fact,
+    we should <em>always</em> do this
+    so that we don't lose the history of our project
+    if our laptop is damaged or stolen.
+    A departmental server is also much more likely to be backed up regularly
+    than our personal machine&hellip;
+  </p>
  
-      <p>
-        The obvious next steps are
-        to put our repository on a server,
-        rather than on our personal machine,
-        and to give other people access to the repository we have just created
-        so that they can work with us.
-        We'll discuss the first in <a href="web.html#s:svn">a later chapter</a>,
-        but unfortunately,
-        the second really does require things that we are not going to cover in this course.
-        If you want to do this, you can:
-      </p>
+  <p>
+    Creating a repository on a server is simple:
+    just log in and go through the steps described above.
+    Accessing that repository from another machine
+    is also straightforward.
+    If the machine's address is <code>serv.euphoric.edu</code>,
+    and our user ID is <code>dracula</code>,
+    the URL of the repository will be something like:
+  </p>
+
+<pre>
+svn+ssh://dracula@serv.euphoric.edu/home/dracula/repos/missions
+</pre>
+
+  <p>
+    Reading from left to right:
+  </p>
  
-      <ul>
+  <ul>
+    <li>
+      <code>svn+ssh</code> is the protocol that Subversion uses to connect to the server
+      (in this case,
+      a combination of Subversion's own protocol
+      and <a href="shell.html#s:ssh">SSH</a>);
+    </li>
+    <li>
+      <code>dracula@serv.euphoric.edu</code> identifies the server and who we are
+      (just like an email address);
+      and
+    </li>
+    <li>
+      <code>/home/dracula/repos/missions</code> is the absolutely path of the repository
+      on the server.
+    </li>
+  </ul>
  
-        <li>
-          ask your system administrator to set it up for you;
-        </li>
+  <p id="a:only_user">
+    That's fine if you are the only person using the repository,
+    but if you want to share it with others,
+    you need to worry about security.
+    As we discuss in the lesson on <a href="web.html">web programming</a>,
+    as soon as you provide a service on the internet,
+    there's the possibility that someone may try to attack your system through it.
+    Rather than trying to learn enough system administration skills
+    to set things up safely,
+    it is usually easier to:
+  </p>
  
-        <li>
-          use an open source hosting service like <a href="http://www.sf.net">SourceForge</a>,
-          <a href="http://code.google.com">Google Code</a>,
-          <a href="https://github.com/">GitHub</a>,
-          or <a href="https://bitbucket.org/">BitBucket</a>; or
-        </li>
+  <ul>
  
-        <li>
-          spend a few dollars a month on a commercial hosting service like <a href="http://dreamhost.com">DreamHost</a>
-          that provides web-based GUIs for creating and managing repositories.
-        </li>
+    <li>
+      ask your department's system administrator to set it up for you;
+    </li>
  
-      </ul>
+    <li>
+      use a hosting service like <a href="http://www.sf.net">SourceForge</a>,
+      <a href="http://code.google.com">Google Code</a>,
+      <a href="https://github.com/">GitHub</a>,
+      or <a href="https://bitbucket.org/">BitBucket</a>; or
+    </li>
  
-      <p>
-        If you choose the second or third option,
-        please check with whoever handles intellectual property at your institution
-        to make sure that putting your work on a commercially-operated machine
-        that is probably in some other legal jurisdiction
-        isn't going to cause trouble.
-        Many people assume that it's "just OK",
-        while others act as if not having asked will be an acceptable defence later on.
-        Unfortunately,
-        neither is true&hellip;
-      </p>
+    <li>
+      spend a few dollars a month on a commercial hosting service
+      that provides web-based GUIs for creating and managing repositories.
+    </li>
  
-      <div class="keypoints" id="k:setup">
-        <h3>Summary</h3>
-        <ul>
-          <li>Repositories can be hosted locally, on local (departmental) servers, on hosting services, or on their owners' own domains.</li>
-          <li><code>svnadmin create <em>name</em></code> creates a new repository.</li>
-        </ul>
-      </div>
+  </ul>
  
-    </section>
+  <p>
+    If you choose the second or third option,
+    please check with whoever handles intellectual property at your institution
+    to make sure that putting your work on a commercially-operated machine
+    that is probably in some other legal jurisdiction
+    isn't going to cause trouble.
+    Many people assume that it's "just OK",
+    while others act as if not having asked will be an acceptable defence later on.
+    Unfortunately,
+    neither is true&hellip;
+  </p>
+
+  <div class="keypoints">
+    <h3>Summary</h3>
+    <ul>
+      <li><code>svnadmin create <em>name</em></code> creates a new repository.</li>
+      <li>Repositories can be hosted locally, on local (departmental) servers, on hosting services, or on their owners' own domains.</li>
+    </ul>
+  </div>
+
+  <div class="challenges">
+    <h3>Challenges</h3>
  
-    <section id="s:provenance">
+    <ol>
  
-      <h2>Provenance</h2>
+      <li>
+        Create a Subversion repository called <code>trials_repo</code>
+        in your home directory.
+        Check out a working copy in a directory called <code>trials_working</code>
+        (also in your home directory).
+        Add a couple of text files,
+        commit the changes,
+        and then use <code>svn info trials_working</code>
+        to see what Subversion tells you about your working copy.
+      </li>
  
-      <div class="understand" id="u:provenance">
-        <h3>Understand:</h3>
+      <li>
+        We said <a href="#a:only_user">above</a> that
+        you might be the only person using a particular repository.
+        When and why is version control worth using
+        if no-one else is working on a project with you?
+      </li>
+
+      <li>
+        There are many ways to organize repositories.
+        Some of the most common are to create one repository for:
          <ul>
-          <li>What data provenance is.</li>
-          <li>How to embed version numbers and other information in files managed by version control.</li>
-          <li>How to record version information about a program in its output.</li>
+          <li>each person</li>
+          <li>each paper</li>
+          <li>all the work done on one grant</li>
+          <li>all the work done on one project</li>
+          <li>the entire lab (which is shared by everyone in the lab)</li>
+          <li>the entire department (typically with a top-level directory for each person or project in the department)</li>
          </ul>
-      </div>
+        What activities does each one make easy or hard?
+        Which of these would you prefer, and why?
+      </li>
  
-      <p>
-        In art,
-        the <a href="glossary.html#provenance">provenance</a> of a work
-        is the history of who owned it, when, and where.
-        In science,
-        it's the record of how a particular result came to be:
-        what raw data was processed by what version of what program to create which intermediate files,
-        what was used to turn those files into which figures of which papers,
-        and so on.
-      </p>
+    </ol>
+  </div>
  
-      <p>
-        One of the central ideas of this course is that
-        wen can automatically track the provenance of scientific data.
-        To start,
-        suppose we have a text file <code>combustion.dat</code> in a Subversion repository.
-        Run the following two commands:
-      </p>
+</section>
+
+<section id="s:provenance">
+  <h2>Provenance</h2>
+
+  <div class="understand">
+    <h3>Understand:</h3>
+    <ul>
+      <li>What data provenance is.</li>
+      <li>How to embed version numbers and other information in files managed by version control.</li>
+      <li>How to record version information about a program in its output.</li>
+    </ul>
+  </div>
+
+  <p>
+    In art,
+    the <a href="glossary.html#provenance">provenance</a> of a work
+    is the history of who owned it, when, and where.
+    In science,
+    it's the record of how a particular result came to be:
+    what raw data was processed by what version of what program to create which intermediate files,
+    what was used to turn those files into which figures of which papers,
+    and so on.
+  </p>
+
+  <p>
+    One of the big benefits of using version control is that
+    it lets us track the provenance of scientific data automatically.
+    To start,
+    suppose we have a text file <code>combustion.dat</code> in a Subversion repository.
+    Run the following two commands:
+  </p>
  
  <pre>
  $ svn propset svn:keywords Revision combustion.dat
  $ svn commit -m "Turning on the 'Revision' keyword" combustion.dat
  </pre>
  
-      <p>
-        Now open the file in an editor
-        and add the following line somewhere near the top:
-      </p>
+  <p class="continue">
+    This does nothing by itself,
+    but now open the file in an editor
+    and add the following line somewhere near the top:
+  </p>
  
  <pre>
-# $Revision:$
+$Revision:$
  </pre>
  
-      <p>
-        The '#' sign isn't important:
-        it's just what <code>.dat</code> files use to show comments.
-        The <code>$Revision:$</code> string,
-        on the other hand,
-        means something special to Subversion.
-        Save the file, and commit the change:
-      </p>
+  <p>
+    The <code>$Revision:$</code> string means something special to Subversion.
+    Save the file, and commit the change:
+  </p>
  
  <pre>
  $ svn commit -m "Inserting the 'Revision' keyword" combustion.dat
  </pre>
  
-      <p>
-        When we open the file again,
-        we'll see that Subversion has changed that line to something like:
-      </p>
+  <p>
+    When we open the file again,
+    we'll see that Subversion has changed that line to something like:
+  </p>
  
  <pre>
-# $Revision: 143$
+$Revision: 143$
  </pre>
  
-      <p class="continue">
-        i.e., Subversion has inserted the version number
-        after the colon and before the closing <code>$</code>.
-      </p>
+  <p class="continue">
+    i.e., it has inserted the version number
+    after the colon and before the closing <code>$</code>.
+    If we edit the file again&mdash;e.g., add a couple of lines with random numbers&mdash;and
+    commit once more,
+    the line is updated again to:
+  </p>
  
-      <p>
-        Here's what just happened.
-        First, Subversion allows you to set
-        <a href="glossary.html#property-subversion">properties</a>
-        for files and and directories.
-        These properties aren't in the files or directories themselves,
-        but live in Subversion's database.
-        One of those properties,
-        <code>svn:keywords</code>,
-        tells Subversion to look in files that are being changed
-        for strings of the form <code>$propertyname: &hellip;$</code>,
-        where <code>propertyname</code> is a string like <code>Revision</code> or <code>Author</code>.
-        (About half a dozen such strings are supported.)
-      </p>
+<pre>
+$Revision: 144$
+</pre>
  
-      <p>
-        If it sees such a string,
-        Subversion rewrites it as the commit is taking place to replace <code>&hellip;</code>
-        with the current version number,
-        the name of the person making the change,
-        or whatever else the property's name tells it to do.
-        You only have to add the string to the file once;
-        after that,
-        Subversion updates it for you every time the file changes.
-      </p>
+  <p>
+    Here's what just happened.
+    First, Subversion allows uss to add
+    <a href="glossary.html#property-subversion">properties</a>
+    to files and and directories.
+    These properties aren't stored in the files or directories themselves,
+    but in Subversion's database.
+    One of those properties,
+    <code>svn:keywords</code>,
+    tells Subversion to look in files that are being changed
+    for strings of the form <code>$propertyname: &hellip;$</code>,
+    where <code>propertyname</code> is a string like <code>Revision</code> or <code>Author</code>.
+    (About half a dozen such strings are supported.)
+  </p>
  
-      <p>
-        Putting the version number in the file this way can be pretty handy.
-        If you copy the file to another machine,
-        for example,
-        it carries its version number with it,
-        so you can tell which version you have even if it's outside version control.
-        We'll see some more useful things we can do with this information in
-        <a href="python.html">the next chapter</a>.
-      </p>
+  <p>
+    If it sees such a string,
+    Subversion rewrites it as the commit is taking place to replace <code>&hellip;</code>
+    with the current version number,
+    the name of the person making the change,
+    or whatever else the property's name tells it to do.
+    We only have to add the string to the file once;
+    after that,
+    Subversion updates it for you every time the file changes.
+  </p>
  
-      <div class="box">
-
-        <h3>When <em>Not</em> to Use Version Control</h3>
-
-        <p>
-          Despite the rapidly decreasing cost of storage,
-          it is still possible to run out of disk space.
-          In some labs,
-          people can easy go through 2 TB/month if they're not careful.
-          Since version control tools usually store revisions in terms of lines,
-          with binary data files,
-          they end up essentially storing every revision separately.
-          This isn't that bad
-          (it's what we'd be doing anyway),
-          but it means version control isn't doing what it likes to do,
-          and the repository can get very large very quickly.
-          Another concern is that if very old data will no longer be used,
-          it can be nice to archive or delete old data files.
-          This is not possible if our data is version controlled:
-          information can only be added to a repository,
-          so it can only ever increase in size.
-        </p>
-
-      </div>
+  <p>
+    Putting the version number in the file this way can be pretty handy.
+    If you copy the file to another machine,
+    for example,
+    it carries its version number with it,
+    so you can tell which version you have even if it's outside version control.
+    We'll see some more useful things we can do with this information <a href="python.html">later</a>.
+  </p>
  
-      <p>
-        We can use this trick with shell scripts too,
-        or with almost any other kind of program.
-        Going back to Nelle Nemo's data processing from the previous chapter,
-        for example,
-        suppose she writes a shell script that uses <code>gooclean</code>
-        to tidy up data files.
-        Her first version looks like this:
-      </p>
+  <p>
+    We can use this trick with shell scripts too,
+    or with almost any other kind of program.
+    Let's go back to Nelle Nemo's data processing from
+    the lesson on the <a href="shell.html">shell</a>.
+    Suppose she writes a shell script called <code>gooclean</code>
+    to tidy up data files.
+    Her first version looks like this:
+  </p>
  
  <pre>
-for filename in $*
-do
-    gooclean -b 0 100 &lt; $filename &gt; cleaned-$filename
-done
+# gooclean: clean up a single data file
+goonorm -b 0 100 &lt; $1 | goofilter -x --enlarge 2.0 &gt; cleaned-$1
  </pre>
  
-      <p class="continue">
-        i.e., it runs <code>gooclean</code> with bounding values of 0 and 100
-        for each specified file,
-        putting the result in a temporary file with a well-defined name.
-        Assuming that '#' is the comment character for those kinds of data files,
-        she could instead write:
-      </p>
+  <p class="continue">
+    i.e.,
+    it runs <code>goonorm</code> and then <code>goofilter</code> with some fixed parameters
+    and creates an output file called <code>cleaned-something.dat</code>
+    (if the input file's name was <code>something.dat</code>).
+    Assuming that '#' is the comment character for her output files,
+    she could instead write:
+  </p>
  
  <pre>
-for filename in $*
-do
-    <span class="highlight">echo "gooclean $Revision: 901$ -b 0 100" &gt; $filename</span>
-    gooclean -b 0 100 &lt; $filename <span class="highlight">&gt;&gt;</span> cleaned-$filename
-done
+# gooclean: clean up a single data file
+<span class="highlight">echo "# gooclean $Revision:$" &gt; cleaned-$1</span>
+goonorm -b 0 100 &lt; $1 | goofilter -x --enlarge 2.0 <span class="highlight">&gt;&gt;</span> cleaned-$1
  </pre>
  
-      <p>
-        The first change puts a line in the output file
-        that describes how that file was created.
-        The second change is to use <code>&gt;&gt;</code> instead of <code>&gt;</code>
-        to redirect <code>gooclean</code>'s output to the file.
-        <code>&gt;&gt;</code> means "append to":
-        instead of overwriting whatever is in the file,
-        it adds more content to it.
-        This ensures that the first line of the file is the provenance record,
-        with the actual output of <code>gooclean</code> after it.
-      </p>
+  <p class="continue">
+    then set the <code>svn:keywords</code> property
+    and commit the file to insert the revision number,
+    making it:
+  </p>
  
-      <div class="keypoints" id="k:provenance">
-        <h3>Summary</h3>
-        <ul>
-          <li><code>$Keyword:$</code> in a file can be filled in with a property value each time the file is committed.</li>
-          <li idea="paranoia">Put version numbers in programs' output to establish provenance for data.</li>
-          <li><code>svn propset svn:keywords <em>property</em> <em>files</em></code> tells Subversion to start filling in property values.</li>
-        </ul>
-      </div>
+<pre>
+# gooclean: clean up a single data file
+<span class="highlight">echo "# gooclean $Revision: 487$" &gt; cleaned-$1</span>
+goonorm -b 0 100 &lt; $1 | goofilter -x --enlarge 2.0 <span class="highlight">&gt;&gt;</span> cleaned-$1
+</pre>
  
-    </section>
+  <p>
+    Now,
+    each time this script is run it will:
+  </p>
+
+  <ul>
+    <li>
+      put the line
+<pre>
+# gooclean $Revision: 487$
+</pre>
+      in the output file,
+      then
+    </li>
+    <li>
+      append whatever the pipline containing <code>goonorm</code> and <code>goofilter</code>
+      would have put in the file originally.
+      (The double redirection <code>&gt;&gt;</code> means "append to" rather than "overwrite".)
+    </li>
+  </ul>
+
+  <p class="continue">
+    In other words,
+    the output of this shell script will always record
+    exactly what version of the script produced it.
+    This isn't enough to reproduce the output&mdash;we would need to record
+    the version numbers of the input files and the <code>goonorm</code> and <code>goofilter</code> programs,
+    and the values of the parameters those programs used
+    in order to do that&mdash;but it's an important and useful first step.
+  </p>
+
+  <div class="keypoints">
+    <h3>Summary</h3>
+    <ul>
+      <li><code>$Keyword: &hellip;$</code> in a file can be filled in with a property value each time the file is committed.</li>
+      <li>Put version numbers in programs' output to establish provenance for data.</li>
+      <li><code>svn propset svn:keywords <em>property</em> <em>files</em></code> tells Subversion to start filling in property values.</li>
+    </ul>
+  </div>
+
+  <div class="challenges">
+    <h3>Challenges</h3>
+
+    <ol>
+
+      <li>
+        Add <code>$Id:$</code> to a file,
+        use <code>svn propset</code> to set the corresponding property,
+        and then commit a change to the file.
+        What value does Subversion fill in for this keyword?
+        When would you use this rather than <code>Revision</code> or <code>Author</code>?
+      </li>
+
+      <li>
+        What does the <code>svn:ignore</code> property do when applied to a directory?
+        When would you use it?
+      </li>
+
+    </ol>
+
+  </div>
+
+</section>
  
  <section id="s:summary">
    <h2>Summing Up</h2>
  
    <p>
-    Correlation does not imply causality,
-    but there is a very strong correlation between
-    using version control
-    and doing good computational science.
-    There's an equally strong correlation
-    between <em>not</em> using it and either wasting effort or getting things wrong.
-    Today (the middle of 2013),
-    I will not review a paper if the software used in it
-    is not under version control.
-    The work it reports might be interesting,
-    but without the kind of record-keeping that version control provides,
-    there's no way to know exactly what its authors did.
-    Just as importantly,
-    if someone doesn't know enough about computing to use version control,
-    the odds are good that they don't know enough
-    to do the programming right either.
+    In 2006,
+    <a href="bib.html#mccullough-reproducibility">McCullough, McGeary, and Harrison</a>
+    analyzed several years of
+    the data and code archive of <cite>Journal of Money, Credit, and Banking</cite>,
+    a prestigious journal with a mandatory archiving policy.
+    Of 266 articles published during that time,
+    193 were empirical and should have had data and code deposited in the archive.
+    Of those,
+    only 69 actually had anything in the archive;
+    Excluding eleven articles that only had data,
+    and seven that required software or other resources they did not have,
+    McCullough et al. were only able to replicate 14 of the remaining 186 articles.
+    This doesn't mean that the other 92% were wrong,
+    but it does mean there is no practical way to tell.
+  </p>
+
+  <p>
+    By itself,
+    version control doesn't making computational research reproducible.
+    It <em>does</em> help,
+    though,
+    and also eliminates the frustration and wasted time caused by
+    trying to figure out which emailed copy of a file,
+    or which of a dozen directories or USB drives,
+    is the most recent.
+    And while correlation doesn't imply causality,
+    there is certainly a strong correlation between
+    knowing enough about good computational practices to use version control
+    and knowing how to do other things right as well.
    </p>
  
  </section>