Adding figures

[swc-version-control-svn.git] / svn.html
diff --git a/svn.html b/svn.html

index dc61a4e5666d981ff6ba780b7b4911f6311930cb..aa203067780a3d1a4b02ff2673c2d8f22aa3f0fb 100644 (file)
--- a/svn.html
+++ b/svn.html
@@ -58,40 +58,118 @@
  
  </ol>
  
-<div class="box">
-  <h3>Nothing's Perfekt</h3>
-
-  <p>
-    Version control systems do have one important shortcoming.
-    While it is easy for them to find, display, and merge differences in text files,
-    images, MP3s, PDFs, or Microsoft Word or Excel files aren't stored as text&mdash;they
-    use specialized binary data formats.
-    Most version control systems don't know how to deal with these formats,
-    so all they can say is, "These files differ."
-    Reconciling those differences will probably require use of an auxiliary tool,
-    such as an audio editor
-    or Microsoft Word's "Compare and Merge" utility.
-  </p>
-</div>
-
  <p>
    The rest of this chapter will explore how to use
    a popular open source version control system called Subversion.
+  It does not have all the features of some newer systems,
+  such as <a href="git.html">Git</a>,
+  but it is still widely used,
+  and is simpler to pick up than those more advanced alternatives.
+  No matter which system you use,
+  the most important thing to learn is not the details of their more obscure commands,
+  but the workflow that they encourage.
  </p>
  
  <div class="guide">
    <h2>For Instructors</h2>
  
-  <p class="fixme">explain</p>
+  <p>
+    Version control is the most important practical skill we introduce.
+    As the last paragraph of the introduction above says,
+    the workflow matters more than the ins and outs of any particular tool.
+    By the end of 90 minutes,
+    the instructor should be able to get learners to chant,
+    "Update, edit, merge, commit," in unison,
+    and have them understand what those terms mean
+    and why that's a good way to structure their working day.
+  </p>
+
+  <p>
+    Provided there aren't network problems,
+    this entire lesson can be covered in <span class="duration">90 minutes</span>.
+    The example at the end
+    showing how to use Subversion keywords to track provenance
+    is the "ah ha!" moment for many learners.
+    If time is short,
+    skip the material on recovering old versions of files
+    in order to get to this section instead.
+    (The fact that provenance is harder in Git,
+    both mechanically and conceptually,
+    is one reason to keep teaching Subversion.)
+  </p>
  
    <div class="prereq">
      <h3>Prerequisites</h3>
-    <p class="fixme">prereq</p>
+    <p>
+      Basic shell concepts and skills
+      (<code>ls</code>, <code>cd</code>, <code>mkdir</code>,
+      editing files);
+      basic shell scripting
+      (for the discussion of <a href="#s:provenance">provenance</a>).
+    </p>
    </div>
  
    <div class="notes">
      <h3>Teaching Notes</h3>
      <ul>
+      <li>
+        Make sure the network is working <em>before</em> starting this lesson.
+      </li>
+      <li>
+        Give learners a ten-minute overview of what version control does for them
+        before diving into the watch-and-do practicals.
+        Most of them will have tried to co-author papers by emailing files back and forth,
+        or will have biked into the office
+        only to realize that the USB key with last night's work
+        is still on the kitchen table.
+        Instructors can also make jokes about directories with names like
+        "final version",
+        "final version revised",
+        "final version with reviewer three's corrections",
+        "really final version",
+        and,
+        "come on this really has to be the last version"
+        to motivate version control as a better way to collaborate
+        and as a better way to back work up.
+      </li>
+      <li>
+        Version control is typically taught after the shell,
+        so collect learners' names during that session
+        and create a repository for them to share
+        with their names as both their IDs and their passwords.
+        The easiest way to create the repository is to use
+        a server managed by an ISP such as Dreamhost,
+        or on SourceForge, Google Code, or some other "forge" site,
+        all of which provide web interfaces for repository creation and management.
+        If your learners are advanced enough to be using SSH,
+        you can instead create it on any server they can access,
+        and connect with the <code>svn+ssh</code> protocol instead of HTTPS.
+      </li>
+      <li>
+        Be very clear what files learners are to edit
+        and what user IDs they are to use
+        when giving instructions.
+        It is common for them to edit the instructor's biography,
+        or to use the instructor's user ID and password when committing.
+        Be equally clear <em>when</em> they are to edit things:
+        it's also common for someone to edit the file the instructor is editing
+        and commit changes while the instructor is explaining what's going on,
+        so that a conflict occurs when the instructor comes to commit the file.
+      </li>
+      <li>
+        Learners could do most exercises with repositories on their own machines,
+        but it's hard for them to see how version control helps collaboration
+        unless they're sharing a repository with other learners.
+        In particular,
+        showing learners who changed what using <code>svn blame</code>
+        is only compelling if a file has been edited by at least two people.
+      </li>
+      <li>
+        If some learners are using Windows,
+        there will inevitably be issues merging files with different line endings.
+        <code>svn diff -x -w</code> is supposed to suppress differences in whitespace,
+        but we have found that it doesn't always work as advertised.
+      </li>
      </ul>
    </div>
  
@@ -101,7 +179,7 @@
    <h2>Basic Use</h2>
  
    <div class="understand">
-    <h3>Learning Objectives:</h3>
+    <h3>Learning Objectives</h3>
      <ul>
        <li>Draw a diagram showing the places version control stores information.</li>
        <li>Check out a working copy of a repository.</li>
@@ -311,7 +389,7 @@ cydonia.txt  mons-olympus.txt</span>
  
  <pre>
  $ <span class="in">pwd</span>
-<span class="out">/home/vlad/explore</span>
+<span class="out">/home/dracula/explore</span>
  $ <span class="in">ls -a</span>
  <span class="out">.    ..    .svn    earth    jupiter    mars</span>
  $ <span class="in">ls -F .svn</span>
@@ -478,6 +556,30 @@ Committed revision 7.</span>
      <figcaption>Figure 8: Updated Repository</figcaption>
    </figure>
  
+  <div class="box">
+    <h3>When <em>Not</em> to Use Version Control</h3>
+
+    <p>
+      Despite the rapidly decreasing cost of storage,
+      it is still possible to run out of disk space.
+      In some labs,
+      people can easy go through 2 TB/month if they're not careful.
+      Since version control tools usually store revisions in terms of lines,
+      with binary data files,
+      they end up essentially storing every revision separately.
+      This isn't that bad
+      (it's what we'd be doing anyway),
+      but it means version control isn't doing what it likes to do,
+      and the repository can get very large very quickly.
+      Another concern is that if very old data will no longer be used,
+      it can be nice to archive or delete old data files.
+      This is not possible if our data is version controlled:
+      information can only be added to a repository,
+      so it can only ever increase in size.
+    </p>
+
+  </div>
+
    <p id="a:define-head">
      Back in his cubicle,
      Wolfman uses <code>svn update</code> to update his working copy.
@@ -683,6 +785,22 @@ $ <span class="in">svn diff -r HEAD</span>
  
    </div>
  
+  <div class="box">
+    <h3>Nothing's Perfekt</h3>
+
+    <p>
+      Version control systems do have one important shortcoming.
+      While it is easy for them to find, display, and merge differences in text files,
+      images, MP3s, PDFs, or Microsoft Word or Excel files aren't stored as text&mdash;they
+      use specialized binary data formats.
+      Most version control systems don't know how to deal with these formats,
+      so all they can say is, "These files differ."
+      Reconciling those differences will probably require use of an auxiliary tool,
+      such as an audio editor
+      or Microsoft Word's "Compare and Merge" utility.
+    </p>
+  </div>
+
    <div class="box">
      <h3>Diffing Other Files</h3>
  
@@ -777,6 +895,35 @@ $ <span class="in">diff left.txt right.txt</span>
      and if necessary undo later on.
    </p>
  
+  <div class="box">
+    <h3>Who Did What?</h3>
+
+    <p>
+      One other very useful command is <code>svn blame</code>,
+      which shows when each line in the file was last changed
+      and by whom:
+    </p>
+
+<pre>
+$ <span class="in">svn blame moons.txt</span>
+<span class="out">    14    dracula Name            Orbital Radius  Orbital Period  Mass            Radius
+    14    dracula                 (10**3 km)      (days)          (10**20 kg)     (km)
+    14    dracula Amalthea        181.4           0.498179        0.075           131 x 73 x 67
+     9    mummy   Io              421.6           1.769138        893.2           1821.6
+     9    mummy   Europa          670.9           3.551181        480.0           1560.8
+     9    mummy   Ganymede        1070.4          7.154553        1481.9          2631.2
+    14    dracula Callisto        1882.7          16.689018       1075.9          2410.3
+    14    dracula Himalia         11460           250.5662        0.095           85.0
+    14    dracula Elara           11740           259.6528        0.008           40.0</span>
+</pre>
+
+    <p>
+      If you are ever wondering who to talk to about a change,
+      or why it was made,
+      <code>svn blame</code> is a good place to start.
+    </p>
+  </div>
+
    <div class="keypoints">
      <h3>Summary</h3>
      <ul>
@@ -859,7 +1006,7 @@ $ <span class="in">diff left.txt right.txt</span>
    <h2>Merging Conflicts</h2>
  
    <div class="understand">
-    <h3>Learning Objectives:</h3>
+    <h3>Learning Objectives</h3>
      <ul>
        <li>Explain what causes conflicts to occur and how to tell when one has occurred.</li>
        <li>Resolve a conflict.</li>
@@ -947,9 +1094,15 @@ svn: resource out of date; try updating</span>
      and <em>only</em> in his working copy.
      But since Wolfman's version of the file now includes
      the lines that Dracula added,
-    Wolfman can go ahead and commit them as usual to create revision 10.
+    Wolfman can go ahead and commit them as usual to create revision 10
+    (<a href="#f:merge_without_conflict">Figure 10</a>).
    </p>
  
+  <figure id="f:merge_without_conflict">
+    <img src="svn/merge_without_conflict.png" alt="Merging Without Conflict" />
+    <figcaption>Figure 10: Merging Without Conflict</figcaption>
+  </figure>
+
    <p>
      Wolfman's working copy is now in sync with the master,
      but Dracula's is one behind at revision 9.
@@ -997,7 +1150,16 @@ Elara           11740           259.6528        0.008           40.0
      Subversion tells him he can't.
      But this time,
      when Dracula does updates his working copy,
-    he doesn't just get the line Wolfman added to create revision 11.
+    he doesn't just get the line Wolfman added to create revision 11
+    (<a href="#f:merge_with_conflict">Figure 11</a>).
+  </p>
+
+  <figure id="f:merge_with_conflict">
+    <img src="svn/merge_with_conflict.png" alt="Merge With Conflict" />
+    <figcaption>Figure 11: Merge With Conflict</figcaption>
+  </figure>
+
+  <p>
      There is an actual conflict in the file,
      so Subversion asks Dracula what he wants to do:
    </p>
@@ -1110,12 +1272,12 @@ Elara           11740           259.6528        0.008           40.0
      it displays his file,
      the common base that he and Wolfman were working from,
      and Wolfman's file in a three-pane view
-    (<a href="#f:diff_viewer">Figure 10</a>):
+    (<a href="#f:diff_viewer">Figure 12</a>):
    </p>
  
    <figure id="f:diff_viewer">
      <img src="svn/diff_viewer.png" alt="A Difference Viewer" />
-    <figcaption>Figure 10: A Difference Viewer</figcaption>
+    <figcaption>Figure 12: A Difference Viewer</figcaption>
    </figure>
  
    <p class="continue">
@@ -1204,40 +1366,32 @@ Elara           11740           259.6528        0.008           40.0
        Edit your biography in one copy and commit those changes,
        then switch to the other copy and edit the same file
        before updating.
-      <a href="#f:challenge_conflict">Figure 11</a> shows
-      the differences between these two challenges.
      </p>
-
-    <figure id="f:challenge_conflict">
-      <img src="svn/challenge_conflict.png" alt="Practicing Conflict Resolution" />
-      <figcaption>Figure 11: Practicing Conflict Resolution</figcaption>
-    </figure>
    </div>
  
  </section>
  
-    <section id="s:rollback">
+<section id="s:rollback">
+  <h2>Recovering Old Versions</h2>
  
-      <h2>Recovering Old Versions</h2>
-
-      <div class="understand" id="u:rollback">
-        <h3>Understand:</h3>
-        <ul>
-          <li>How to undo changes to a working copy.</li>
-          <li>How to recover old versions of files.</li>
-          <li>What a branch is.</li>
-        </ul>
-      </div>
+  <div class="understand">
+    <h3>Learning Objectives</h3>
+    <ul>
+      <li>Discard changes made to a working copy.</li>
+      <li>Recover an old version of a file.</li>
+      <li>Explain what branches are and when they are used.</li>
+    </ul>
+  </div>
  
-      <p>
-        Now that we have seen how to merge files and resolve conflicts,
-        we can look at how to use version control as an "infinite undo".
-        Suppose that when Wolfman starts work late one night,
-        his copy of <code>explore</code> is in sync with the head at revision 12.
-        He decides to edit the file <code>moons.txt</code>;
-        unfortunately, he forgot that there was a full moon,
-        so his changes don't make a lot of sense:
-      </p>
+  <p>
+    Now that we have seen how to merge files and resolve conflicts,
+    we can look at how to use version control as an "infinite undo".
+    Suppose that when Wolfman starts work late one night,
+    his copy of <code>explore</code> is in sync with the head at revision 12.
+    He decides to edit the file <code>moons.txt</code>;
+    unfortunately, he forgot that there was a full moon,
+    so his changes don't make a lot of sense:
+  </p>
  
  <pre src="svn/poetry.txt">
  Just one moon can make me growl
@@ -1245,35 +1399,35 @@ Four would make me want to howl
  ...
  </pre>
  
-      <p>
-        When he's back in human form the next day,
-        he wants to undo his changes.
-        Without version control, his choices would be grim:
-        he could try to edit them back into their original state by hand
-        (which for some reason hardly ever seems to work),
-        or ask his colleagues to send him their copies of the files
-        (which is almost as embarrassing as chasing the neighbor's cat when in wolf form).
-      </p>
+  <p>
+    When he's back in human form the next day,
+    he wants to undo his changes.
+    Without version control, his choices would be grim:
+    he could try to edit them back into their original state by hand
+    (which for some reason hardly ever seems to work),
+    or ask his colleagues to send him their copies of the files
+    (which is almost as embarrassing as chasing the neighbor's cat when in wolf form).
+  </p>
  
-      <p>
-        Since he's using Subversion, though,
-        and hasn't committed his work to the repository,
-        all he has to do is <a href="glossary.html#revert">revert</a> his local changes.
-        <code>svn revert</code> simply throws away local changes to files
-        and puts things back the way they were before those changes were made.
-        This is a purely local operation:
-        since Subversion stores the history of the project inside every working copy,
-        Wolfman doesn't need to be connected to the network to do this.
-      </p>
+  <p>
+    Since he's using Subversion, though,
+    and hasn't committed his work to the repository,
+    all he has to do is <a href="glossary.html#revert">revert</a> his local changes.
+    <code>svn revert</code> simply throws away local changes to files
+    and puts things back the way they were before those changes were made.
+    This is a purely local operation:
+    since Subversion stores the history of the project inside every working copy,
+    Wolfman doesn't need to be connected to the network to do this.
+  </p>
  
-      <p>
-        To start,
-        Wolfman uses <code>svn diff</code> <em>without</em> the <code>-r HEAD</code> flag
-        to take a look at the differences between his file
-        and the master copy in the repository.
-        Since he doesn't want to keep his changes,
-        his next command is <code>svn revert moons.txt</code>.
-      </p>
+  <p>
+    To start,
+    Wolfman uses <code>svn diff</code> <em>without</em> the <code>-r HEAD</code> flag
+    to take a look at the differences between his file
+    and the master copy in the repository.
+    Since he doesn't want to keep his changes,
+    his next command is <code>svn revert moons.txt</code>.
+  </p>
  
  <pre>
  $ <span class="in">cd jupiter</span>
@@ -1281,13 +1435,13 @@ $ <span class="in">svn revert moons.txt</span>
  <span class="out">Reverted   moons.txt</span>
  </pre>
  
-      <p>
-        What if someone <em>has</em> committed their changes,
-        but still wants to undo them?
-        For example,
-        suppose Dracula decides that the numbers in <code>moons.txt</code> would look better with commas.
-        He edits the file to put them in:
-      </p>
+  <p>
+    What if someone <em>has</em> committed their changes,
+    but still wants to undo them?
+    For example,
+    suppose Dracula decides that the numbers in <code>moons.txt</code> would look better with commas.
+    He edits the file to put them in:
+  </p>
  
  <pre src="svn/moons_commas.txt">
  Name            Orbital Radius  Orbital Period  Mass            Radius
@@ -1301,47 +1455,49 @@ Himalia      11<span class="highlight">,</span>460           250.5662
  Elara        11<span class="highlight">,</span>740           259.6528            0.008           40.0
  </pre>
  
-      <p class="continue">
-        then commits his changes to create revision 13.
-        A little while later,
-        the Mummy sees the change and orders Dracula to put things back the way they were.
-        What should Dracula do?
-      </p>
+  <p class="continue">
+    then commits his changes to create revision 13.
+    A little while later,
+    the Mummy sees the change and orders Dracula to put things back the way they were.
+    What should Dracula do?
+  </p>
  
-      <p>
-        We can draw the sequence of events leading up to revision 13
-        as shown in <a href="#f:before_undoing">Fixture XXX</a>:
-      </p>
+  <p>
+    We can draw the sequence of events leading up to revision 13
+    as shown in <a href="#f:before_undoing">Figure 13</a>:
+  </p>
  
-      <figure id="f:before_undoing">
-        <img src="svn/before_undoing.png" alt="Before Undoing" />
-      </figure>
+  <figure id="f:before_undoing">
+    <img src="svn/before_undoing.png" alt="Before Undoing" />
+    <figcaption>Figure 13: Before Undoing</figcaption>
+  </figure>
  
-      <p class="continue">
-        Dracula wants to erase revision 13 from the repository,
-        but he can't actually do that:
-        once a change is in the repository,
-        it's there forever.
-        What he can do instead is merge the old revision with the current revision
-        to create a new revision
-        (<a href="#f:merging_history">Fixture XXX</a>).
-      </p>
+  <p class="continue">
+    Dracula wants to erase revision 13 from the repository,
+    but he can't actually do that:
+    once a change is in the repository,
+    it's there forever.
+    What he can do instead is merge the old revision with the current revision
+    to create a new revision
+    (<a href="#f:merging_history">Figure 14</a>).
+  </p>
  
-      <figure id="f:merging_history">
-        <img src="svn/merging_history.png" alt="Merging History" />
-      </figure>
+  <figure id="f:merging_history">
+    <img src="svn/merging_history.png" alt="Merging History" />
+    <figcaption>Figure 14: Merging History</figcaption>
+  </figure>
  
-      <p class="continue">
-        This is exactly like merging changes made by two different people;
-        the only difference is that the "other person" is his past self.
-      </p>
+  <p class="continue">
+    This is exactly like merging changes made by two different people;
+    the only difference is that the "other person" is his past self.
+  </p>
  
-      <p>
-        To undo his commas,
-        Dracula must merge revision 12 (the one before his change)
-        with revision 13 (the current head revision)
-        using <code>svn merge</code>:
-      </p>
+  <p>
+    To undo his commas,
+    Dracula must merge revision 12 (the one before his change)
+    with revision 13 (the current head revision)
+    using <code>svn merge</code>:
+  </p>
  
  <pre>
  $ <span class="in">svn merge -r HEAD:12 moons.txt</span>
@@ -1349,526 +1505,702 @@ $ <span class="in">svn merge -r HEAD:12 moons.txt</span>
  U  moons.txt</span>
  </pre>
  
-      <p class="continue">
-        The <code>-r</code> flag specifies the range of revisions to merge:
-        to undo the changes from revision 12 to revision 13,
-        he uses either <code>13:12</code> or <code>HEAD:12</code>
-        (since he is going backward in time from the most recent revision to revision 12).
-        This is called a <a href="glossary.html#reverse-merge">reverse</a> merge
-        because he's going backward in time.
-      </p>
+  <p class="continue">
+    The <code>-r</code> flag specifies the range of revisions to merge:
+    to undo the changes from revision 12 to revision 13,
+    he uses either <code>13:12</code> or <code>HEAD:12</code>
+    (since he is going backward in time from the most recent revision to revision 12).
+    This is called a <a href="glossary.html#reverse-merge">reverse</a> merge
+    because he's going backward in time.
+  </p>
  
-      <p>
-        After he runs this command,
-        he must run <code>svn commit</code> to save the changes to the repository.
-        This creates a new revision, number 14,
-        rather than erasing revision 13.
-        That way,
-        the changes he made to create revision 13 are still there
-        if he can ever convince the Mummy that numbers should have commas.
-      </p>
+  <p>
+    After he runs this command,
+    he must run <code>svn commit</code> to save the changes to the repository.
+    This creates a new revision, number 14,
+    rather than erasing revision 13.
+    That way,
+    the changes he made to create revision 13 are still there
+    if he can ever convince the Mummy that numbers should have commas.
+  </p>
  
-      <p>
-        Merging can be used to recover older revisions of files,
-        not just the most recent,
-        and to recover many files or directories at a time.
-        The most frequent use, though,
-        is to manage parallel streams of development in large projects.
-        This is outside the scope of this chapter,
-        but the basic idea is simple.
-      </p>
+  <div class="box">
+    <h3>Another Way to Do It</h3>
  
-      <p>
-        Suppose that Universal Monsters has just released a new program for designing secret lairs.
-        Dracula and Wolfman are supposed to start adding a few features
-        that had to be left out of the first release because time ran short.
-        At the same time,
-        Frankenstein and the Mummy are doing technical support:
-        their job is to fix any bugs that users find.
-        All sorts of things could go wrong if both teams tried to work on the same code at the same time.
-        For example,
-        if Frankenstein fixed a bug and sent a new copy of the program to a user in Greenland,
-        it would be all too easy for him to accidentally include
-        the half-completed shark tank control feature that Wolfman was working on.
-      </p>
+    <p>
+      Another way to recover a particular version of a particular file
+      is to use the <code>svn copy</code> command.
+      If the URL of our repository is
+      <code>https://universal.software-carpentry.org/explore</code>,
+      then the command:
+    </p>
  
-      <p>
-        The usual way to handle this situation is
-        to create a <a href="glossary.html#branch">branch</a>
-        in the repository for each major sub-project
-        (<a href="#f:branch_merge">Figure XXX</a>).
-        While Wolfman and Dracula work on
-        the <a href="glossary.html#main-line">main line</a>,
-        Frankenstein and the Mummy create a branch,
-        which is just another copy of the repository's files and directories
-        that is also under version control.
-        They can work in their branch without disturbing Wolfman and Dracula and vice versa:
-      </p>
+<pre>
+$ <span class="in">svn copy https://universal.software-carpentry.org/explore/mission.txt@120 ./mission.txt</span>
+</pre>
  
-      <figure id="f:branch_merge">
-        <img src="svn/branch_merge.png" alt="Branching and Merging" />
-      </figure>
+    <p class="continue">
+      copies the file <code>mission.txt</code> as it was in revision 120
+      into our working directory
+      (overwriting whatever <code>mission.txt</code> file we currently have,
+      if any).
+      What's more,
+      using <code>svn copy</code> brings along the file's history as well,
+      so that future <code>svn log</code> operations will show
+      how <code>mission.txt</code> was resurrected.
+    </p>
+  </div>
  
-      <p>
-        Branches in version control repositories are often described as "parallel universes".
-        Each branch starts off as a clone of the project at some moment in time
-        (typically each time the software is released,
-        or whenever work starts on a major new feature).
-        Changes made to a branch only affect that branch,
-        just as changes made to the files in one directory don't affect files in other directories.
-        However,
-        the branch and the main line are both stored in the same repository,
-        so their revision numbers are always in step.
-      </p>
+  <p>
+    Merging can be used to recover older revisions of files,
+    not just the most recent,
+    and to recover many files or directories at a time.
+    The most frequent use, though,
+    is to manage parallel streams of development in large projects.
+    This is outside the scope of this chapter,
+    but the basic idea is simple.
+  </p>
  
-      <p>
-        If someone decides that a bug fix in one branch should also be made in another,
-        all they have to do is merge the files in question.
-        This is exactly like merging an old version of a file with the current one,
-        but instead of going backward in time,
-        the change is brought sideways from one branch to another.
-      </p>
+  <p>
+    Suppose that Universal Missions has just released a new program
+    for designing interplanetary voyages.
+    Dracula and Wolfman are supposed to add some features
+    that were left out of the first release because time ran short.
+    At the same time,
+    Frankenstein and the Mummy are doing technical support:
+    their job is to fix any bugs that users find.
+  </p>
  
-      <p>
-        Branching helps projects scale up by letting sub-teams work independently,
-        but too many branches can cause as many problems as they solve.
-        Karl Fogel's excellent book
-        <a href="bib.html#fogel-producing-oss"><cite>Producing Open Source Software</cite></a>,
-        and Laura Wingerd and Christopher Seiwald's paper
-        "<a href="bib.html#wingerd-seiwald-scm">High-level Best Practices in Software Configuration Management</a>",
-        talk about branches in much more detail.
-        Projects usually don't need to do this until they have a dozen or more developers,
-        or until several versions of their software are in simultaneous use,
-        but using branches is a key part of switching from software carpentry to software engineering.
-      </p>
+  <p>
+    All sorts of things could go wrong
+    if both teams tried to work on the same code at the same time.
+    In particular,
+    Dracula and Wolfman might want to make large changes
+    to the structure of the code
+    in order to make it easier to add new features,
+    while Frankenstein and the Mummy want to make as few changes as possible
+    so as not to introduce new bugs while fixing old ones.
+  </p>
  
-      <div class="keypoints" id="k:rollback">
-        <h3>Summary</h3>
-        <ul>
-          <li>Old versions of files can be recovered by merging their old state with their current state.</li>
-          <li>Recovering an old version of a file does not erase the intervening changes.</li>
-          <li>Use branches to support parallel independent development.</li>
-          <li><code>svn merge</code> merges two revisions of a file.</li>
-          <li><code>svn revert</code> undoes local changes to files.</li>
-        </ul>
-      </div>
+  <p>
+    The usual way to handle this situation is
+    to create a <a href="glossary.html#branch">branch</a>
+    in the repository for each major sub-project
+    (<a href="#f:branch_merge">Figure 15</a>).
+    While Wolfman and Dracula work on
+    the <a href="glossary.html#main-line">main line</a>,
+    Frankenstein and the Mummy create a branch,
+    which is just another copy of the repository's files and directories
+    that is also under version control.
+    They can work in their branch without disturbing Wolfman and Dracula and vice versa:
+  </p>
  
-    </section>
+  <figure id="f:branch_merge">
+    <img src="svn/branch_merge.png" alt="Branching and Merging" />
+    <figcaption>Figure 15: Branching and Merging</figcaption>
+  </figure>
  
-    <section id="s:setup">
+  <p>
+    Branches in version control repositories are often described as "parallel universes".
+    Each branch starts off as a clone of the project at some moment in time
+    (typically each time the software is released,
+    or whenever work starts on a major new feature).
+    Changes made to a branch only affect that branch,
+    just as changes made to the files in one directory don't affect files in other directories.
+    However,
+    the branch and the main line are both stored in the same repository,
+    so their revision numbers are always in step.
+  </p>
  
-      <h2>Setting up a Repository</h2>
+  <p>
+    If someone decides that a bug fix in one branch should also be made in another,
+    all they have to do is merge the files in question.
+    This is exactly like merging an old version of a file with the current one,
+    but instead of going backward in time,
+    the change is brought sideways from one branch to another.
+  </p>
  
-      <div class="understand" id="u:setup">
-        <h3>Understand:</h3>
-        <ul>
-          <li>How to create a repository.</li>
-        </ul>
-      </div>
+  <p>
+    Branching helps projects scale up by letting sub-teams work independently,
+    but too many branches can cause as many problems as they solve.
+    Karl Fogel's excellent book
+    <a href="bib.html#fogel-producing-oss"><cite>Producing Open Source Software</cite></a>,
+    and Laura Wingerd and Christopher Seiwald's paper
+    "<a href="bib.html#wingerd-seiwald-scm">High-level Best Practices in Software Configuration Management</a>",
+    talk about branches in much more detail.
+    Projects usually don't need to do this until they have a dozen or more developers,
+    or until several versions of their software are in simultaneous use,
+    but using branches is a key part of switching from software carpentry to software engineering.
+  </p>
  
-      <p>
-        It is finally time to see how to create a repository.
-        As a quick recap,
-        we will keep the master copy of our work in a repository
-        on a server that we can access from other machines on the internet.
-        That master copy consists of files and directories that no-one ever edits directly.
-        Instead, a copy of Subversion running on that machine
-        manages updates for us and watches for conflicts.
-        Our working copy is a mirror image of the master sitting on our computer.
-        When our Subversion client needs to communicate with the master,
-        it exchanges data with the copy of Subversion running on the server.
-      </p>
+  <div class="keypoints">
+    <h3>Summary</h3>
+    <ul>
+      <li>Old versions of files can be recovered by merging their old state with their current state.</li>
+      <li>Recovering an old version of a file does not erase the intervening changes.</li>
+      <li>Use branches to support parallel independent development.</li>
+      <li><code>svn revert</code> undoes local changes to files.</li>
+      <li><code>svn merge</code> merges two revisions of a file.</li>
+    </ul>
+  </div>
  
-      <figure id="f:repo_four_things">
-        <img src="svn/repo_four_things.png" alt="What's Needed for a Repository" />
-      </figure>
+  <div class="challenges">
+    <h3>Challenges</h3>
  
-      <p>
-        To make this to work, we need four things
-        (<a href="#f:repo_four_things">Figure XXX</a>):
-      </p>
+    <ol>
+      <li>
+        Explain what the command:
+<pre>
+svn diff -r 240:261 fish.dat
+</pre>
+        does, and when you might want to run it.
+      </li>
  
-      <ol>
-
-        <li>
-          The repository itself.
-          It's not enough to create an empty directory and start filling it with files:
-          Subversion needs to create a lot of other structure
-          in order to keep track of old revisions, who made what changes, and so on.
-        </li>
-
-        <li>
-          The full URL of the repository.
-          This includes the URL of the server
-          and the path to the repository on that machine.
-          (The second part is needed because a single server can,
-          and usually will,
-          host many repositories.)
-        </li>
-
-        <li>
-          Permission to read or write the master copy.
-          Many open source projects give the whole world permission to read from their repository,
-          but very few allow strangers to write to it:
-          there are just too many possibilities for abuse.
-          Somehow, we have to set up a password or something like it
-          so that users can prove who they are.
-        </li>
-
-        <li>
-          A working copy of the repository on our computer.
-          Once the first three things are in place,
-          this just means running the <code>checkout</code> command.
-        </li>
-
-      </ol>
+      <li>
+        Suppose that a file called <code>mission.txt</code>
+        existed in revision 90 of a repository,
+        but had been deleted in revision 91.
+        What two commands could we use to recover it?
+      </li>
  
-      <p>
-        To keep things simple,
-        we will start by creating a repository on the machine that we're working on.
-        This won't let us share our work with other people,
-        but it <em>will</em> allow us to save the history of our work as we go along.
-      </p>
+    </ol>
+  </div>
  
-      <p>
-        The command to create a repository is <code>svnadmin create</code>,
-        followed by the path to the repository.
-        If we want to create a repository called <code>lair_repo</code>
-        directly under our home directory,
-        we just <code>cd</code> to get home
-        and run <code>svnadmin create lair_repo</code>.
-        This command creates a directory called <code>lair_repo</code> to hold our repository,
-        and fills it with various files that Subversion uses
-        to keep track of the project's history:
-      </p>
+</section>
+
+<section id="s:setup">
+  <h2>Setting up a Repository</h2>
+
+  <div class="understand">
+    <h3>Learning Objectives</h3>
+    <ul>
+      <li>How to create a repository.</li>
+    </ul>
+  </div>
+
+  <p>
+    It is finally time to see how to create a repository.
+    As a quick recap,
+    we will keep the master copy of our work in a repository
+    on a server that we can access from other machines on the internet.
+    That master copy consists of files and directories that no-one ever edits directly.
+    Instead, a copy of Subversion running on that machine
+    manages updates for us and watches for conflicts.
+    Our working copy is a mirror image of the master sitting on our computer.
+    When our Subversion client needs to communicate with the master,
+    it exchanges data with the copy of Subversion running on the server.
+  </p>
+
+  <p>
+    To make this to work, we need four things:
+  </p>
+
+  <ol>
+
+    <li>
+      The repository itself.
+      It's not enough to create an empty directory and start filling it with files:
+      Subversion needs to create a lot of other structure
+      in order to keep track of old revisions, who made what changes, and so on.
+    </li>
+
+    <li>
+      The full URL of the repository.
+      This includes the URL of the server
+      and the path to the repository on that machine.
+      (The second part is needed because a single server can,
+      and usually will,
+      host many repositories.)
+    </li>
+
+    <li>
+      Permission to read or write the master copy.
+      Many open source projects give the whole world permission to read from their repository,
+      but very few allow strangers to write to it:
+      there are just too many possibilities for abuse.
+      Somehow, we have to set up a password or something like it
+      so that users can prove who they are.
+    </li>
+
+    <li>
+      A working copy of the repository on our computer.
+      Once the first three things are in place,
+      this just means running the <code>checkout</code> command.
+    </li>
+
+  </ol>
+
+  <p>
+    To keep things simple,
+    we will start by creating a repository on the machine that we're working on.
+    This won't let us share our work with other people,
+    but it <em>will</em> allow us to save the history of our work as we go along.
+  </p>
+
+  <p>
+    The command to create a repository is <code>svnadmin create</code>,
+    followed by the path to the repository.
+    If we want to create a repository called <code>missions_repo</code>
+    directly under our home directory,
+    we just <code>cd</code> to get home
+    and run <code>svnadmin create missions_repo</code>.
+    This command creates a directory called <code>missions_repo</code> to hold our repository,
+    and fills it with various files that Subversion uses
+    to keep track of the project's history:
+  </p>
  
  <pre>
  $ <span class="in">cd</span>
-$ <span class="in">svnadmin create lair_repo</span>
-$ <span class="in">ls -F lair_repo</span>
+$ <span class="in">svnadmin create missions_repo</span>
+$ <span class="in">ls -F missions_repo</span>
  <span class="out">README.txt    conf/    db/    format    hooks/    locks/</span>
  </pre>
  
-      <p class="continue">
-        We should <em>never</em> edit anything in this repository directly.
-        Doing so probably won't shred our sanity and leave us gibbering in mindless horror,
-        but it will almost certainly make the repository unusable.
-      </p>
+  <p class="continue">
+    We should <em>never</em> edit any of this directly,
+    since it will almost certainly make the repository unusable.
+    Instead,
+    we should use <code>svn checkout</code>
+    to get a working copy of this repository.
+    If our home directory is <code>/users/mummy</code>,
+    then the full path to the repository we just created is <code>/users/mummy/missions_repo</code>,
+    so we run <code>svn checkout file:///users/mummy/missions missions_working</code>.
+  </p>
  
-      <p>
-        To get a working copy of this repository,
-        we use Subversion's <code>checkout</code> command.
-        If our home directory is <code>/users/mummy</code>,
-        then the full path to the repository we just created is <code>/users/mummy/lair_repo</code>,
-        so we run <code>svn checkout file:///users/mummy/lair lair_working</code>.
-      </p>
+  <p>
+    Working backward,
+    the second argument,
+    <code>missions_working</code>,
+    specifies where the working copy is to be put.
+    The first argument is the URL of our repository,
+    and it has two parts.
+    <code>/users/mummy/missions_repo</code> is the path to repository directory.
+    <code>file://</code> specifies the <a href="glossary.html#protocol">protocol</a>
+    that Subversion will use to communicate with the repository&mdash;in this case,
+    it says that the repository is part of the local machine's filesystem.
+    (Notice that the protocol ends in two slashes,
+    while the absolute path to the repository starts with a slash,
+    making three in total.
+    A very common mistake is to type only two, since that's what web URLs normally have.)
+  </p>
  
-      <p>
-        Working backward,
-        the second argument,
-        <code>lair_working</code>,
-        specifies where the working copy is to be put.
-        The first argument is the URL of our repository,
-        and it has two parts.
-        <code>/users/mummy/lair_repo</code> is the path to repository directory.
-        <code>file://</code> specifies the <a href="glossary.html#protocol">protocol</a>
-        that Subversion will use to communicate with the repository&mdash;in this case,
-        it says that the repository is part of the local machine's filesystem.
-        Notice that the protocol ends in two slashes,
-        while the absolute path to the repository starts with a slash,
-        making three in total.
-        A very common mistake is to type only two, since that's what web URLs normally have.
-      </p>
+  <p>
+    When we're doing a checkout,
+    it is <em>very</em> important that we provide the second argument,
+    which specifies the name of the directory we want the working copy to be put in.
+    Without it,
+    Subversion will try to use the name of the repository,
+    <code>missions_repo</code>,
+    as the name of the working copy.
+    Since we're in the directory that contains the repository,
+    this means that Subversion will try to overwrite the repository with a working copy.
+    Again,
+    there isn't much risk of our sanity being torn to shreds,
+    but this could ruin our repository.
+  </p>
  
-      <p>
-        When we're doing a checkout,
-        it is <em>very</em> important that we provide the second argument,
-        which specifies the name of the directory we want the working copy to be put in.
-        Without it,
-        Subversion will try to use the name of the repository,
-        <code>lair_repo</code>,
-        as the name of the working copy.
-        Since we're in the directory that contains the repository,
-        this means that Subversion will try to overwrite the repository with a working copy.
-        Again,
-        there isn't much risk of our sanity being torn to shreds,
-        but this could ruin our repository.
-      </p>
+  <p>
+    To avoid this problem,
+    most people create a sub-directory in their account called something like <code>repos</code>,
+    and then create their repositories in that.
+    For example,
+    we could create our repository in <code>/users/mummy/repos/missions</code>,
+    then check out a working copy as <code>/users/mummy/missions</code>.
+    This practice makes both names easier to read.
+  </p>
  
-      <p>
-        To avoid this problem,
-        most people create a sub-directory in their account called something like <code>repos</code>,
-        and then create their repositories in that.
-        For example,
-        we could create our repository in <code>/users/mummy/repos/lair</code>,
-        then check out a working copy as <code>/users/mummy/lair</code>.
-        This practice makes both names easier to read.
-      </p>
+  <p>
+    The obvious next step is to put our repository on a server,
+    rather than on our personal machine.
+    In fact,
+    we should <em>always</em> do this
+    so that we don't lose the history of our project
+    if our laptop is damaged or stolen.
+    A departmental server is also much more likely to be backed up regularly
+    than our personal machine&hellip;
+  </p>
  
-      <p>
-        The obvious next steps are
-        to put our repository on a server,
-        rather than on our personal machine,
-        and to give other people access to the repository we have just created
-        so that they can work with us.
-        We'll discuss the first in <a href="web.html#s:svn">a later chapter</a>,
-        but unfortunately,
-        the second really does require things that we are not going to cover in this course.
-        If you want to do this, you can:
-      </p>
+  <p>
+    Creating a repository on a server is simple:
+    just log in and go through the steps described above.
+    Accessing that repository from another machine
+    is also straightforward.
+    If the machine's address is <code>serv.euphoric.edu</code>,
+    and our user ID is <code>dracula</code>,
+    the URL of the repository will be something like:
+  </p>
+
+<pre>
+svn+ssh://dracula@serv.euphoric.edu/home/dracula/repos/missions
+</pre>
+
+  <p>
+    Reading from left to right:
+  </p>
  
-      <ul>
+  <ul>
+    <li>
+      <code>svn+ssh</code> is the protocol that Subversion uses to connect to the server
+      (in this case,
+      a combination of Subversion's own protocol
+      and <a href="shell.html#s:ssh">SSH</a>);
+    </li>
+    <li>
+      <code>dracula@serv.euphoric.edu</code> identifies the server and who we are
+      (just like an email address);
+      and
+    </li>
+    <li>
+      <code>/home/dracula/repos/missions</code> is the absolutely path of the repository
+      on the server.
+    </li>
+  </ul>
+
+  <p id="a:only_user">
+    That's fine if you are the only person using the repository,
+    but if you want to share it with others,
+    you need to worry about security.
+    As we discuss in the lesson on <a href="web.html">web programming</a>,
+    as soon as you provide a service on the internet,
+    there's the possibility that someone may try to attack your system through it.
+    Rather than trying to learn enough system administration skills
+    to set things up safely,
+    it is usually easier to:
+  </p>
  
-        <li>
-          ask your system administrator to set it up for you;
-        </li>
+  <ul>
  
-        <li>
-          use an open source hosting service like <a href="http://www.sf.net">SourceForge</a>,
-          <a href="http://code.google.com">Google Code</a>,
-          <a href="https://github.com/">GitHub</a>,
-          or <a href="https://bitbucket.org/">BitBucket</a>; or
-        </li>
+    <li>
+      ask your department's system administrator to set it up for you;
+    </li>
  
-        <li>
-          spend a few dollars a month on a commercial hosting service like <a href="http://dreamhost.com">DreamHost</a>
-          that provides web-based GUIs for creating and managing repositories.
-        </li>
+    <li>
+      use a hosting service like <a href="http://www.sf.net">SourceForge</a>,
+      <a href="http://code.google.com">Google Code</a>,
+      <a href="https://github.com/">GitHub</a>,
+      or <a href="https://bitbucket.org/">BitBucket</a>; or
+    </li>
  
-      </ul>
+    <li>
+      spend a few dollars a month on a commercial hosting service
+      that provides web-based GUIs for creating and managing repositories.
+    </li>
  
-      <p>
-        If you choose the second or third option,
-        please check with whoever handles intellectual property at your institution
-        to make sure that putting your work on a commercially-operated machine
-        that is probably in some other legal jurisdiction
-        isn't going to cause trouble.
-        Many people assume that it's "just OK",
-        while others act as if not having asked will be an acceptable defence later on.
-        Unfortunately,
-        neither is true&hellip;
-      </p>
+  </ul>
  
-      <div class="keypoints" id="k:setup">
-        <h3>Summary</h3>
-        <ul>
-          <li>Repositories can be hosted locally, on local (departmental) servers, on hosting services, or on their owners' own domains.</li>
-          <li><code>svnadmin create <em>name</em></code> creates a new repository.</li>
-        </ul>
-      </div>
+  <p>
+    If you choose the second or third option,
+    please check with whoever handles intellectual property at your institution
+    to make sure that putting your work on a commercially-operated machine
+    that is probably in some other legal jurisdiction
+    isn't going to cause trouble.
+    Many people assume that it's "just OK",
+    while others act as if not having asked will be an acceptable defence later on.
+    Unfortunately,
+    neither is true&hellip;
+  </p>
  
-    </section>
+  <div class="keypoints">
+    <h3>Summary</h3>
+    <ul>
+      <li><code>svnadmin create <em>name</em></code> creates a new repository.</li>
+      <li>Repositories can be hosted locally, on local (departmental) servers, on hosting services, or on their owners' own domains.</li>
+    </ul>
+  </div>
+
+  <div class="challenges">
+    <h3>Challenges</h3>
+
+    <ol>
  
-    <section id="s:provenance">
+      <li>
+        Create a Subversion repository called <code>trials_repo</code>
+        in your home directory.
+        Check out a working copy in a directory called <code>trials_working</code>
+        (also in your home directory).
+        Add a couple of text files,
+        commit the changes,
+        and then use <code>svn info trials_working</code>
+        to see what Subversion tells you about your working copy.
+      </li>
  
-      <h2>Provenance</h2>
+      <li>
+        We said <a href="#a:only_user">above</a> that
+        you might be the only person using a particular repository.
+        When and why is version control worth using
+        if no-one else is working on a project with you?
+      </li>
  
-      <div class="understand" id="u:provenance">
-        <h3>Understand:</h3>
+      <li>
+        There are many ways to organize repositories.
+        Some of the most common are to create one repository for:
          <ul>
-          <li>What data provenance is.</li>
-          <li>How to embed version numbers and other information in files managed by version control.</li>
-          <li>How to record version information about a program in its output.</li>
+          <li>each person</li>
+          <li>each paper</li>
+          <li>all the work done on one grant</li>
+          <li>all the work done on one project</li>
+          <li>the entire lab (which is shared by everyone in the lab)</li>
+          <li>the entire department (typically with a top-level directory for each person or project in the department)</li>
          </ul>
-      </div>
+        What activities does each one make easy or hard?
+        Which of these would you prefer, and why?
+      </li>
  
-      <p>
-        In art,
-        the <a href="glossary.html#provenance">provenance</a> of a work
-        is the history of who owned it, when, and where.
-        In science,
-        it's the record of how a particular result came to be:
-        what raw data was processed by what version of what program to create which intermediate files,
-        what was used to turn those files into which figures of which papers,
-        and so on.
-      </p>
+    </ol>
+  </div>
  
-      <p>
-        One of the central ideas of this course is that
-        wen can automatically track the provenance of scientific data.
-        To start,
-        suppose we have a text file <code>combustion.dat</code> in a Subversion repository.
-        Run the following two commands:
-      </p>
+</section>
+
+<section id="s:provenance">
+  <h2>Provenance</h2>
+
+  <div class="understand">
+    <h3>Understand:</h3>
+    <ul>
+      <li>What data provenance is.</li>
+      <li>How to embed version numbers and other information in files managed by version control.</li>
+      <li>How to record version information about a program in its output.</li>
+    </ul>
+  </div>
+
+  <p>
+    In art,
+    the <a href="glossary.html#provenance">provenance</a> of a work
+    is the history of who owned it, when, and where.
+    In science,
+    it's the record of how a particular result came to be:
+    what raw data was processed by what version of what program to create which intermediate files,
+    what was used to turn those files into which figures of which papers,
+    and so on.
+  </p>
+
+  <p>
+    One of the big benefits of using version control is that
+    it lets us track the provenance of scientific data automatically.
+    To start,
+    suppose we have a text file <code>combustion.dat</code> in a Subversion repository.
+    Run the following two commands:
+  </p>
  
  <pre>
  $ svn propset svn:keywords Revision combustion.dat
  $ svn commit -m "Turning on the 'Revision' keyword" combustion.dat
  </pre>
  
-      <p>
-        Now open the file in an editor
-        and add the following line somewhere near the top:
-      </p>
+  <p class="continue">
+    This does nothing by itself,
+    but now open the file in an editor
+    and add the following line somewhere near the top:
+  </p>
  
  <pre>
-# $Revision:$
+$Revision:$
  </pre>
  
-      <p>
-        The '#' sign isn't important:
-        it's just what <code>.dat</code> files use to show comments.
-        The <code>$Revision:$</code> string,
-        on the other hand,
-        means something special to Subversion.
-        Save the file, and commit the change:
-      </p>
+  <p>
+    The <code>$Revision:$</code> string means something special to Subversion.
+    Save the file, and commit the change:
+  </p>
  
  <pre>
  $ svn commit -m "Inserting the 'Revision' keyword" combustion.dat
  </pre>
  
-      <p>
-        When we open the file again,
-        we'll see that Subversion has changed that line to something like:
-      </p>
+  <p>
+    When we open the file again,
+    we'll see that Subversion has changed that line to something like:
+  </p>
  
  <pre>
-# $Revision: 143$
+$Revision: 143$
  </pre>
  
-      <p class="continue">
-        i.e., Subversion has inserted the version number
-        after the colon and before the closing <code>$</code>.
-      </p>
+  <p class="continue">
+    i.e., it has inserted the version number
+    after the colon and before the closing <code>$</code>.
+    If we edit the file again&mdash;e.g., add a couple of lines with random numbers&mdash;and
+    commit once more,
+    the line is updated again to:
+  </p>
  
-      <p>
-        Here's what just happened.
-        First, Subversion allows you to set
-        <a href="glossary.html#property-subversion">properties</a>
-        for files and and directories.
-        These properties aren't in the files or directories themselves,
-        but live in Subversion's database.
-        One of those properties,
-        <code>svn:keywords</code>,
-        tells Subversion to look in files that are being changed
-        for strings of the form <code>$propertyname: &hellip;$</code>,
-        where <code>propertyname</code> is a string like <code>Revision</code> or <code>Author</code>.
-        (About half a dozen such strings are supported.)
-      </p>
+<pre>
+$Revision: 144$
+</pre>
  
-      <p>
-        If it sees such a string,
-        Subversion rewrites it as the commit is taking place to replace <code>&hellip;</code>
-        with the current version number,
-        the name of the person making the change,
-        or whatever else the property's name tells it to do.
-        You only have to add the string to the file once;
-        after that,
-        Subversion updates it for you every time the file changes.
-      </p>
+  <p>
+    Here's what just happened.
+    First, Subversion allows uss to add
+    <a href="glossary.html#property-subversion">properties</a>
+    to files and and directories.
+    These properties aren't stored in the files or directories themselves,
+    but in Subversion's database.
+    One of those properties,
+    <code>svn:keywords</code>,
+    tells Subversion to look in files that are being changed
+    for strings of the form <code>$propertyname: &hellip;$</code>,
+    where <code>propertyname</code> is a string like <code>Revision</code> or <code>Author</code>.
+    (About half a dozen such strings are supported.)
+  </p>
  
-      <p>
-        Putting the version number in the file this way can be pretty handy.
-        If you copy the file to another machine,
-        for example,
-        it carries its version number with it,
-        so you can tell which version you have even if it's outside version control.
-        We'll see some more useful things we can do with this information in
-        <a href="python.html">the next chapter</a>.
-      </p>
+  <p>
+    If it sees such a string,
+    Subversion rewrites it as the commit is taking place to replace <code>&hellip;</code>
+    with the current version number,
+    the name of the person making the change,
+    or whatever else the property's name tells it to do.
+    We only have to add the string to the file once;
+    after that,
+    Subversion updates it for you every time the file changes.
+  </p>
  
-      <div class="box">
-
-        <h3>When <em>Not</em> to Use Version Control</h3>
-
-        <p>
-          Despite the rapidly decreasing cost of storage,
-          it is still possible to run out of disk space.
-          In some labs,
-          people can easy go through 2 TB/month if they're not careful.
-          Since version control tools usually store revisions in terms of lines,
-          with binary data files,
-          they end up essentially storing every revision separately.
-          This isn't that bad
-          (it's what we'd be doing anyway),
-          but it means version control isn't doing what it likes to do,
-          and the repository can get very large very quickly.
-          Another concern is that if very old data will no longer be used,
-          it can be nice to archive or delete old data files.
-          This is not possible if our data is version controlled:
-          information can only be added to a repository,
-          so it can only ever increase in size.
-        </p>
-
-      </div>
+  <p>
+    Putting the version number in the file this way can be pretty handy.
+    If you copy the file to another machine,
+    for example,
+    it carries its version number with it,
+    so you can tell which version you have even if it's outside version control.
+    We'll see some more useful things we can do with this information <a href="python.html">later</a>.
+  </p>
  
-      <p>
-        We can use this trick with shell scripts too,
-        or with almost any other kind of program.
-        Going back to Nelle Nemo's data processing from the previous chapter,
-        for example,
-        suppose she writes a shell script that uses <code>gooclean</code>
-        to tidy up data files.
-        Her first version looks like this:
-      </p>
+  <p>
+    We can use this trick with shell scripts too,
+    or with almost any other kind of program.
+    Let's go back to Nelle Nemo's data processing from
+    the lesson on the <a href="shell.html">shell</a>.
+    Suppose she writes a shell script called <code>gooclean</code>
+    to tidy up data files.
+    Her first version looks like this:
+  </p>
  
  <pre>
-for filename in $*
-do
-    gooclean -b 0 100 &lt; $filename &gt; cleaned-$filename
-done
+# gooclean: clean up a single data file
+goonorm -b 0 100 &lt; $1 | goofilter -x --enlarge 2.0 &gt; cleaned-$1
  </pre>
  
-      <p class="continue">
-        i.e., it runs <code>gooclean</code> with bounding values of 0 and 100
-        for each specified file,
-        putting the result in a temporary file with a well-defined name.
-        Assuming that '#' is the comment character for those kinds of data files,
-        she could instead write:
-      </p>
+  <p class="continue">
+    i.e.,
+    it runs <code>goonorm</code> and then <code>goofilter</code> with some fixed parameters
+    and creates an output file called <code>cleaned-something.dat</code>
+    (if the input file's name was <code>something.dat</code>).
+    Assuming that '#' is the comment character for her output files,
+    she could instead write:
+  </p>
  
  <pre>
-for filename in $*
-do
-    <span class="highlight">echo "gooclean $Revision: 901$ -b 0 100" &gt; $filename</span>
-    gooclean -b 0 100 &lt; $filename <span class="highlight">&gt;&gt;</span> cleaned-$filename
-done
+# gooclean: clean up a single data file
+<span class="highlight">echo "# gooclean $Revision:$" &gt; cleaned-$1</span>
+goonorm -b 0 100 &lt; $1 | goofilter -x --enlarge 2.0 <span class="highlight">&gt;&gt;</span> cleaned-$1
  </pre>
  
-      <p>
-        The first change puts a line in the output file
-        that describes how that file was created.
-        The second change is to use <code>&gt;&gt;</code> instead of <code>&gt;</code>
-        to redirect <code>gooclean</code>'s output to the file.
-        <code>&gt;&gt;</code> means "append to":
-        instead of overwriting whatever is in the file,
-        it adds more content to it.
-        This ensures that the first line of the file is the provenance record,
-        with the actual output of <code>gooclean</code> after it.
-      </p>
+  <p class="continue">
+    then set the <code>svn:keywords</code> property
+    and commit the file to insert the revision number,
+    making it:
+  </p>
  
-      <div class="keypoints" id="k:provenance">
-        <h3>Summary</h3>
-        <ul>
-          <li><code>$Keyword:$</code> in a file can be filled in with a property value each time the file is committed.</li>
-          <li idea="paranoia">Put version numbers in programs' output to establish provenance for data.</li>
-          <li><code>svn propset svn:keywords <em>property</em> <em>files</em></code> tells Subversion to start filling in property values.</li>
-        </ul>
-      </div>
+<pre>
+# gooclean: clean up a single data file
+<span class="highlight">echo "# gooclean $Revision: 487$" &gt; cleaned-$1</span>
+goonorm -b 0 100 &lt; $1 | goofilter -x --enlarge 2.0 <span class="highlight">&gt;&gt;</span> cleaned-$1
+</pre>
+
+  <p>
+    Now,
+    each time this script is run it will:
+  </p>
+
+  <ul>
+    <li>
+      put the line
+<pre>
+# gooclean $Revision: 487$
+</pre>
+      in the output file,
+      then
+    </li>
+    <li>
+      append whatever the pipline containing <code>goonorm</code> and <code>goofilter</code>
+      would have put in the file originally.
+      (The double redirection <code>&gt;&gt;</code> means "append to" rather than "overwrite".)
+    </li>
+  </ul>
+
+  <p class="continue">
+    In other words,
+    the output of this shell script will always record
+    exactly what version of the script produced it.
+    This isn't enough to reproduce the output&mdash;we would need to record
+    the version numbers of the input files and the <code>goonorm</code> and <code>goofilter</code> programs,
+    and the values of the parameters those programs used
+    in order to do that&mdash;but it's an important and useful first step.
+  </p>
+
+  <div class="keypoints">
+    <h3>Summary</h3>
+    <ul>
+      <li><code>$Keyword: &hellip;$</code> in a file can be filled in with a property value each time the file is committed.</li>
+      <li>Put version numbers in programs' output to establish provenance for data.</li>
+      <li><code>svn propset svn:keywords <em>property</em> <em>files</em></code> tells Subversion to start filling in property values.</li>
+    </ul>
+  </div>
+
+  <div class="challenges">
+    <h3>Challenges</h3>
+
+    <ol>
+
+      <li>
+        Add <code>$Id:$</code> to a file,
+        use <code>svn propset</code> to set the corresponding property,
+        and then commit a change to the file.
+        What value does Subversion fill in for this keyword?
+        When would you use this rather than <code>Revision</code> or <code>Author</code>?
+      </li>
  
-    </section>
+      <li>
+        What does the <code>svn:ignore</code> property do when applied to a directory?
+        When would you use it?
+      </li>
+
+    </ol>
+
+  </div>
+
+</section>
  
  <section id="s:summary">
    <h2>Summing Up</h2>
  
    <p>
-    Correlation does not imply causality,
-    but there is a very strong correlation between
-    using version control
-    and doing good computational science.
-    There's an equally strong correlation
-    between <em>not</em> using it and either wasting effort or getting things wrong.
-    Today (the middle of 2013),
-    I will not review a paper if the software used in it
-    is not under version control.
-    The work it reports might be interesting,
-    but without the kind of record-keeping that version control provides,
-    there's no way to know exactly what its authors did.
-    Just as importantly,
-    if someone doesn't know enough about computing to use version control,
-    the odds are good that they don't know enough
-    to do the programming right either.
+    In 2006,
+    <a href="bib.html#mccullough-reproducibility">McCullough, McGeary, and Harrison</a>
+    analyzed several years of
+    the data and code archive of <cite>Journal of Money, Credit, and Banking</cite>,
+    a prestigious journal with a mandatory archiving policy.
+    Of 266 articles published during that time,
+    193 were empirical and should have had data and code deposited in the archive.
+    Of those,
+    only 69 actually had anything in the archive;
+    Excluding eleven articles that only had data,
+    and seven that required software or other resources they did not have,
+    McCullough et al. were only able to replicate 14 of the remaining 186 articles.
+    This doesn't mean that the other 92% were wrong,
+    but it does mean there is no practical way to tell.
+  </p>
+
+  <p>
+    By itself,
+    version control doesn't making computational research reproducible.
+    It <em>does</em> help,
+    though,
+    and also eliminates the frustration and wasted time caused by
+    trying to figure out which emailed copy of a file,
+    or which of a dozen directories or USB drives,
+    is the most recent.
+    And while correlation doesn't imply causality,
+    there is certainly a strong correlation between
+    knowing enough about good computational practices to use version control
+    and knowing how to do other things right as well.
    </p>
  
  </section>