</ol>
-<div class="box">
- <h3>Nothing's Perfekt</h3>
-
- <p>
- Version control systems do have one important shortcoming.
- While it is easy for them to find, display, and merge differences in text files,
- images, MP3s, PDFs, or Microsoft Word or Excel files aren't stored as text—they
- use specialized binary data formats.
- Most version control systems don't know how to deal with these formats,
- so all they can say is, "These files differ."
- Reconciling those differences will probably require use of an auxiliary tool,
- such as an audio editor
- or Microsoft Word's "Compare and Merge" utility.
- </p>
-</div>
-
<p>
The rest of this chapter will explore how to use
a popular open source version control system called Subversion.
+ It does not have all the features of some newer systems,
+ such as <a href="git.html">Git</a>,
+ but it is still widely used,
+ and is simpler to pick up than those more advanced alternatives.
+ No matter which system you use,
+ the most important thing to learn is not the details of their more obscure commands,
+ but the workflow that they encourage.
</p>
<div class="guide">
<h2>For Instructors</h2>
- <p class="fixme">explain</p>
+ <p>
+ Version control is the most important practical skill we introduce.
+ As the last paragraph of the introduction above says,
+ the workflow matters more than the ins and outs of any particular tool.
+ By the end of 90 minutes,
+ the instructor should be able to get learners to chant,
+ "Update, edit, merge, commit," in unison,
+ and have them understand what those terms mean
+ and why that's a good way to structure their working day.
+ </p>
+
+ <p>
+ Provided there aren't network problems,
+ this entire lesson can be covered in <span class="duration">90 minutes</span>.
+ The example at the end
+ showing how to use Subversion keywords to track provenance
+ is the "ah ha!" moment for many learners.
+ If time is short,
+ skip the material on recovering old versions of files
+ in order to get to this section instead.
+ (The fact that provenance is harder in Git,
+ both mechanically and conceptually,
+ is one reason to keep teaching Subversion.)
+ </p>
<div class="prereq">
<h3>Prerequisites</h3>
- <p class="fixme">prereq</p>
+ <p>
+ Basic shell concepts and skills
+ (<code>ls</code>, <code>cd</code>, <code>mkdir</code>,
+ editing files);
+ basic shell scripting
+ (for the discussion of <a href="#s:provenance">provenance</a>).
+ </p>
</div>
<div class="notes">
<h3>Teaching Notes</h3>
<ul>
+ <li>
+ Make sure the network is working <em>before</em> starting this lesson.
+ </li>
+ <li>
+ Give learners a ten-minute overview of what version control does for them
+ before diving into the watch-and-do practicals.
+ Most of them will have tried to co-author papers by emailing files back and forth,
+ or will have biked into the office
+ only to realize that the USB key with last night's work
+ is still on the kitchen table.
+ Instructors can also make jokes about directories with names like
+ "final version",
+ "final version revised",
+ "final version with reviewer three's corrections",
+ "really final version",
+ and,
+ "come on this really has to be the last version"
+ to motivate version control as a better way to collaborate
+ and as a better way to back work up.
+ </li>
+ <li>
+ Version control is typically taught after the shell,
+ so collect learners' names during that session
+ and create a repository for them to share
+ with their names as both their IDs and their passwords.
+ The easiest way to create the repository is to use
+ a server managed by an ISP such as Dreamhost,
+ or on SourceForge, Google Code, or some other "forge" site,
+ all of which provide web interfaces for repository creation and management.
+ If your learners are advanced enough to be using SSH,
+ you can instead create it on any server they can access,
+ and connect with the <code>svn+ssh</code> protocol instead of HTTPS.
+ </li>
+ <li>
+ Be very clear what files learners are to edit
+ and what user IDs they are to use
+ when giving instructions.
+ It is common for them to edit the instructor's biography,
+ or to use the instructor's user ID and password when committing.
+ Be equally clear <em>when</em> they are to edit things:
+ it's also common for someone to edit the file the instructor is editing
+ and commit changes while the instructor is explaining what's going on,
+ so that a conflict occurs when the instructor comes to commit the file.
+ </li>
+ <li>
+ Learners could do most exercises with repositories on their own machines,
+ but it's hard for them to see how version control helps collaboration
+ unless they're sharing a repository with other learners.
+ In particular,
+ showing learners who changed what using <code>svn blame</code>
+ is only compelling if a file has been edited by at least two people.
+ </li>
+ <li>
+ If some learners are using Windows,
+ there will inevitably be issues merging files with different line endings.
+ <code>svn diff -x -w</code> is supposed to suppress differences in whitespace,
+ but we have found that it doesn't always work as advertised.
+ </li>
</ul>
</div>
<h2>Basic Use</h2>
<div class="understand">
- <h3>Learning Objectives:</h3>
+ <h3>Learning Objectives</h3>
<ul>
- <li>Where version control stores information.</li>
- <li>How to check out a working copy of a repository.</li>
- <li>How to view the history of changes to a project.</li>
- <li>Why working copies of different projects should not overlap.</li>
- <li>How to add files to a project.</li>
- <li>How to submit changes made locally to a project's master copy.</li>
- <li>How to update a working copy to get changes made to the master.</li>
- <li>How to check the status of a working copy.</li>
+ <li>Draw a diagram showing the places version control stores information.</li>
+ <li>Check out a working copy of a repository.</li>
+ <li>View the history of changes to a project.</li>
+ <li>Explain why working copies of different projects should not overlap.</li>
+ <li>Add files to a project.</li>
+ <li>Commit changes made to a working copy to a repository.</li>
+ <li>Update a working copy to get changes from the repository.</li>
+ <li>Compare the current state of a working copy to the last update from the repository, and to the current state of the repository.</li>
+ <li>Explain what "version 123 of <code>xyz.txt</code>" actually means.</li>
</ul>
</div>
let's assume that the Mummy
(Dracula and Wolfman's boss)
has already put some notes in a version control repository
- whose URL is <code>https://universal.software-carpentry.org/monsters</code>.
+ whose URL is <code>https://universal.software-carpentry.org/explore</code>.
Every repository has an address like this that uniquely identifies the location of the master copy.
</p>
</p>
<pre>
-$ <span class="in">svn checkout https://universal.software-carpentry.org/monsters</span>
+$ <span class="in">svn checkout https://universal.software-carpentry.org/explore</span>
</pre>
<p class="continue">
- This creates a new directory called <code>monsters</code>
+ This creates a new directory called <code>explore</code>
and fills it with a copy of the repository's contents
(<a href="#f:example_repo">Figure 6</a>).
</p>
<pre>
-<span class="out">A monsters/jupiter
-A monsters/mars
-A monsters/mars/mons-olympus.txt
-A monsters/mars/cydonia.txt
-A monsters/earth
-A monsters/earth/himalayas.txt
-A monsters/earth/antarctica.txt
-A monsters/earth/carlsbad.txt
+<span class="out">A explore/jupiter
+A explore/mars
+A explore/mars/mons-olympus.txt
+A explore/mars/cydonia.txt
+A explore/earth
+A explore/earth/himalayas.txt
+A explore/earth/antarctica.txt
+A explore/earth/carlsbad.txt
Checked out revision 6.</span>
</pre>
</p>
<pre>
-$ <span class="in">cd monsters</span>
+$ <span class="in">cd explore</span>
$ <span class="in">ls</span>
<span class="out">earth jupiter mars</span>
$ <span class="in">ls *</span>
<pre>
$ <span class="in">pwd</span>
-<span class="out">/home/vlad/monsters</span>
+<span class="out">/home/dracula/explore</span>
$ <span class="in">ls -a</span>
<span class="out">. .. .svn earth jupiter mars</span>
$ <span class="in">ls -F .svn</span>
the date the change was made,
and whatever comment the user provided when the change was submitted.
As we can see,
- the <code>monsters</code> project is currently at revision 6,
+ the <code>explore</code> project is currently at revision 6,
and all changes so far have been made by the Mummy.
</p>
<figcaption>Figure 8: Updated Repository</figcaption>
</figure>
+ <div class="box">
+ <h3>When <em>Not</em> to Use Version Control</h3>
+
+ <p>
+ Despite the rapidly decreasing cost of storage,
+ it is still possible to run out of disk space.
+ In some labs,
+ people can easy go through 2 TB/month if they're not careful.
+ Since version control tools usually store revisions in terms of lines,
+ with binary data files,
+ they end up essentially storing every revision separately.
+ This isn't that bad
+ (it's what we'd be doing anyway),
+ but it means version control isn't doing what it likes to do,
+ and the repository can get very large very quickly.
+ Another concern is that if very old data will no longer be used,
+ it can be nice to archive or delete old data files.
+ This is not possible if our data is version controlled:
+ information can only be added to a repository,
+ so it can only ever increase in size.
+ </p>
+
+ </div>
+
<p id="a:define-head">
Back in his cubicle,
Wolfman uses <code>svn update</code> to update his working copy.
</div>
+ <div class="box">
+ <h3>Nothing's Perfekt</h3>
+
+ <p>
+ Version control systems do have one important shortcoming.
+ While it is easy for them to find, display, and merge differences in text files,
+ images, MP3s, PDFs, or Microsoft Word or Excel files aren't stored as text—they
+ use specialized binary data formats.
+ Most version control systems don't know how to deal with these formats,
+ so all they can say is, "These files differ."
+ Reconciling those differences will probably require use of an auxiliary tool,
+ such as an audio editor
+ or Microsoft Word's "Compare and Merge" utility.
+ </p>
+ </div>
+
+ <div class="box">
+ <h3>Diffing Other Files</h3>
+
+ <p>
+ <code>svn diff</code> mimics the behavior of
+ the Unix <code>diff</code> command,
+ which can be used to compare any two files.
+ Given these two files:
+ </p>
+
+ <table>
+ <tr>
+ <th><code>left.txt</code></th>
+ <th><code>right.txt</code></th>
+ </tr>
+ <tr>
+ <td valign="top">
+<pre>hydrogen
+lithium
+sodium
+magnesium
+rubidium</pre>
+ </td>
+ <td valign="top">
+<pre>hydrogen
+lithium
+beryllium
+sodium
+potassium
+strontium</pre>
+ </td>
+ </tr>
+ </table>
+
+ <p class="continue">
+ <code>diff</code>'s output is:
+ </p>
+<pre>
+$ <span class="in">diff left.txt right.txt</span>
+<span class="out">2a3
+> beryllium
+4,5c5,6
+< magnesium
+< rubidium
+---
+> potassium
+> strontium</span>
+</pre>
+ </div>
+
<p>
This is a very common workflow,
and is the basic heartbeat of most developers' days.
and if necessary undo later on.
</p>
+ <div class="box">
+ <h3>Who Did What?</h3>
+
+ <p>
+ One other very useful command is <code>svn blame</code>,
+ which shows when each line in the file was last changed
+ and by whom:
+ </p>
+
+<pre>
+$ <span class="in">svn blame moons.txt</span>
+<span class="out"> 14 dracula Name Orbital Radius Orbital Period Mass Radius
+ 14 dracula (10**3 km) (days) (10**20 kg) (km)
+ 14 dracula Amalthea 181.4 0.498179 0.075 131 x 73 x 67
+ 9 mummy Io 421.6 1.769138 893.2 1821.6
+ 9 mummy Europa 670.9 3.551181 480.0 1560.8
+ 9 mummy Ganymede 1070.4 7.154553 1481.9 2631.2
+ 14 dracula Callisto 1882.7 16.689018 1075.9 2410.3
+ 14 dracula Himalia 11460 250.5662 0.095 85.0
+ 14 dracula Elara 11740 259.6528 0.008 40.0</span>
+</pre>
+
+ <p>
+ If you are ever wondering who to talk to about a change,
+ or why it was made,
+ <code>svn blame</code> is a good place to start.
+ </p>
+ </div>
+
<div class="keypoints">
<h3>Summary</h3>
<ul>
<div class="challenges">
<h3>Challenges</h3>
+
<ol>
<li>
+ Using the repository URL, user ID, and password provided by the instructor,
+ perform the following actions:
+ <ol>
+ <li>
+ Check out a working copy of the repository.
+ </li>
+ <li>
+ Create a text file called <em>your_id</em>.txt
+ (using your user ID instead of <em>your_id</em>)
+ and write a three-line biography of yourself in it.
+ </li>
+ <li>
+ Add this file to your working copy.
+ </li>
+ <li>
+ Commit your changes to the repository.
+ </li>
+ <li>
+ Update your working copy to get other people's biographies.
+ </li>
+ <li>
+ Examine the change log to see
+ the order in which people added their biographies
+ to the repository.
+ </li>
+ </ol>
</li>
<li>
+ What does the command <code>svn diff -r 14</code> do?
+ What does it do if there have only been 10 changes to the repository?
</li>
<li>
+ By default,
+ Unix <code>diff</code> and <code>svn diff</code> compare files line by line.
+ Why doesn't this work for MP3 audio files?
</li>
</ol>
</section>
- <section id="s:merge">
+<section id="s:merge">
+ <h2>Merging Conflicts</h2>
- <h2>Merging Conflicts</h2>
-
- <div class="understand" id="u:merge">
- <h3>Understand:</h3>
- <ul>
- <li>What a conflict in an update is.</li>
- <li>How to resolve conflicts when updating.</li>
- </ul>
- </div>
+ <div class="understand">
+ <h3>Learning Objectives</h3>
+ <ul>
+ <li>Explain what causes conflicts to occur and how to tell when one has occurred.</li>
+ <li>Resolve a conflict.</li>
+ <li>Identify the auxiliary files created when a conflict occurs.</li>
+ </ul>
+ </div>
- <p>
- Dracula and Wolfman have both synchronized their working copies of <code>monsters</code>
- with version 8 of the repository.
- Dracula now edits his copy to change Amalthea's radius
- from a single number to a triple to reflect its irregular shape:
- </p>
+ <p>
+ Dracula and Wolfman have both synchronized their working copies of <code>explore</code>
+ with version 8 of the repository.
+ Dracula now edits his copy to change Amalthea's radius
+ from a single number to a triple to reflect its irregular shape:
+ </p>
<pre src="svn/moons_dracula_triple.txt">
Name Orbital Radius Orbital Period Mass Radius
Callisto 1882.7 16.689018 1075.9 2410.3
</pre>
- <p class="continue">
- He then commits his work,
- creating revision 9 of the repository
- (<a href="#f:after_dracula_commits">Figure XXX</a>).
- </p>
+ <p class="continue">
+ He then commits his work,
+ creating revision 9 of the repository
+ (<a href="#f:after_dracula_commits">Figure 9</a>).
+ </p>
- <figure id="f:after_dracula_commits">
- <img src="svn/after_dracula_commits.png" alt="After Dracula Commits" />
- </figure>
+ <figure id="f:after_dracula_commits">
+ <img src="svn/after_dracula_commits.png" alt="After Dracula Commits" />
+ <figcaption>Figure 9: After Dracula Commits</figcaption>
+ </figure>
- <p>
- But while he is doing this,
- Wolfman is editing <em>his</em> copy
- to add information about two other minor moons,
- Himalia and Elara:
- </p>
+ <p>
+ But while he is doing this,
+ Wolfman is editing <em>his</em> copy
+ to add information about two other minor moons,
+ Himalia and Elara:
+ </p>
<pre src="svn/moons_wolfman_extras.txt">
Name Orbital Radius Orbital Period Mass Radius
Elara 11740 259.6528 0.008 40.0</span>
</pre>
- <p>
- When Wolfman tries to commit his changes to the repository,
- Subversion won't let him:
- </p>
+ <p>
+ When Wolfman tries to commit his changes to the repository,
+ Subversion won't let him:
+ </p>
<pre>
$ <span class="in">svn commit -m "Added data for Himalia, Elara"</span>
svn: resource out of date; try updating</span>
</pre>
- <p class="continue">
- The reason is that
- Wolfman's changes were based on revision 8,
- but the repository is now at revision 9,
- and the file that Wolfman is trying to overwrite
- is different in the later revision.
- (Remember,
- one of version control's main jobs is to make sure that
- people don't trample on each other's work.)
- Wolfman has to update his working copy to get Dracula's changes before he can commit.
- Luckily,
- Dracula edited a line that Wolfman didn't change,
- so Subversion can merge the differences automatically.
- </p>
+ <p class="continue">
+ The reason is that
+ Wolfman's changes were based on revision 8,
+ but the repository is now at revision 9,
+ and the file that Wolfman is trying to overwrite
+ is different in the later revision.
+ (Remember,
+ one of version control's main jobs is to make sure that
+ people don't trample on each other's work.)
+ Wolfman has to update his working copy to get Dracula's changes before he can commit.
+ Luckily,
+ Dracula edited a line that Wolfman didn't change,
+ so Subversion can merge the differences automatically.
+ </p>
- <p>
- This does <em>not</em> mean that Wolfman's changes have been committed to the repository:
- Subversion only does that when it's ordered to.
- Wolfman's changes are still in his working copy,
- and <em>only</em> in his working copy.
- But since Wolfman's version of the file now includes
- the lines that Dracula added,
- Wolfman can go ahead and commit them as usual to create revision 10.
- </p>
+ <p>
+ This does <em>not</em> mean that Wolfman's changes have been committed to the repository:
+ Subversion only does that when it's ordered to.
+ Wolfman's changes are still in his working copy,
+ and <em>only</em> in his working copy.
+ But since Wolfman's version of the file now includes
+ the lines that Dracula added,
+ Wolfman can go ahead and commit them as usual to create revision 10
+ (<a href="#f:merge_without_conflict">Figure 10</a>).
+ </p>
- <p>
- Wolfman's working copy is now in sync with the master,
- but Dracula's is one behind at revision 9.
- At this point,
- they independently decide to add measurement units
- to the columns in <code>moons.txt</code>.
- Wolfman is quicker off the mark this time;
- he adds a line to the file:
- </p>
+ <figure id="f:merge_without_conflict">
+ <img src="svn/merge_without_conflict.png" alt="Merging Without Conflict" />
+ <figcaption>Figure 10: Merging Without Conflict</figcaption>
+ </figure>
+
+ <p>
+ Wolfman's working copy is now in sync with the master,
+ but Dracula's is one behind at revision 9.
+ At this point,
+ they independently decide to add measurement units
+ to the columns in <code>moons.txt</code>.
+ Wolfman is quicker off the mark this time;
+ he adds a line to the file:
+ </p>
<pre src="svn/moons_wolfman_units.txt">
Name Orbital Radius Orbital Period Mass Radius
Elara 11740 259.6528 0.008 40.0
</pre>
- <p class="continue">
- and commits it to create revision 11.
- While he is doing this,
- though,
- Dracula inserts a different line at the top of the file:
- </p>
+ <p class="continue">
+ and commits it to create revision 11.
+ While he is doing this,
+ though,
+ Dracula inserts a different line at the top of the file:
+ </p>
<pre src="svn/moons_dracula_units.txt">
Name Orbital Radius Orbital Period Mass Radius
Elara 11740 259.6528 0.008 40.0
</pre>
- <p>
- Once again,
- when Dracula tries to commit,
- Subversion tells him he can't.
- But this time,
- when Dracula does updates his working copy,
- he doesn't just get the line Wolfman added to create revision 11.
- There is an actual conflict in the file,
- so Subversion asks Dracula what he wants to do:
- </p>
+ <p>
+ Once again,
+ when Dracula tries to commit,
+ Subversion tells him he can't.
+ But this time,
+ when Dracula does updates his working copy,
+ he doesn't just get the line Wolfman added to create revision 11
+ (<a href="#f:merge_with_conflict">Figure 11</a>).
+ </p>
+
+ <figure id="f:merge_with_conflict">
+ <img src="svn/merge_with_conflict.png" alt="Merge With Conflict" />
+ <figcaption>Figure 11: Merge With Conflict</figcaption>
+ </figure>
+
+ <p>
+ There is an actual conflict in the file,
+ so Subversion asks Dracula what he wants to do:
+ </p>
<pre src="svn/moons_dracula_conflict.txt">
$ <span class="in">svn update</span>
(s) show all options:</span>
</pre>
- <p>
- Dracula choose <code>p</code> for "postpone",
- which tells Subversion that he'll deal with the problem later.
- Once the update is finished,
- he opens <code>moons.txt</code> in his editor and sees:
- </p>
+ <p>
+ Dracula choose <code>p</code> for "postpone",
+ which tells Subversion that he'll deal with the problem later.
+ Once the update is finished,
+ he opens <code>moons.txt</code> in his editor and sees:
+ </p>
<pre>
Name Orbital Radius Orbital Period Mass
Callisto 1882.7 16.689018 1075.9
</pre>
- <p class="continue">
- As we can see,
- Subversion has inserted
- <a href="glossary.html#conflict-marker">conflict markers</a>
- in <code>moons.txt</code>
- wherever there is a conflict.
- The line <code><<<<<<< .mine</code> shows the start of the conflict,
- and is followed by the lines from the local copy of the file.
- The separator <code>=======</code> is then
- followed by the lines from the repository's file that are in conflict with that section,
- while <code>>>>>>>> .r11</code> marks the end of the conflict.
- </p>
+ <p class="continue">
+ As we can see,
+ Subversion has inserted
+ <a href="glossary.html#conflict-marker">conflict markers</a>
+ in <code>moons.txt</code>
+ wherever there is a conflict.
+ The line <code><<<<<<< .mine</code> shows the start of the conflict,
+ and is followed by the lines from the local copy of the file.
+ The separator <code>=======</code> is then
+ followed by the lines from the repository's file that are in conflict with that section,
+ while <code>>>>>>>> .r11</code> marks the end of the conflict.
+ </p>
- <p>
- Before he can commit,
- Dracula has to edit his copy of the file to get rid of those markers.
- He changes it to:
- </p>
+ <p>
+ Before he can commit,
+ Dracula has to edit his copy of the file to get rid of those markers.
+ He changes it to:
+ </p>
<pre src="svn/moons_dracula_resolved.txt">
Name Orbital Radius Orbital Period Mass Radius
Elara 11740 259.6528 0.008 40.0
</pre>
- <p class="continue">
- then uses the <code>svn resolved</code> command to tell Subversion that
- he has fixed the problem.
- Subversion will now let him commit to create revision 12.
- </p>
+ <p class="continue">
+ then uses the <code>svn resolved</code> command to tell Subversion that
+ he has fixed the problem.
+ Subversion will now let him commit to create revision 12.
+ </p>
- <div class="box">
-
- <h3>Auxiliary Files</h3>
-
- <p>
- When Dracula did his update and Subversion detected the conflict in <code>moons.txt</code>,
- it created three temporary files to help Dracula resolve it.
- The first is called <code>moons.txt.r9</code>;
- it is the file as it was in Dracula's local copy
- before he started making changes,
- i.e., the common ancestor for his work
- and whatever he is in conflict with.
- </p>
-
- <p>
- The second file is <code>moons.txt.r11</code>.
- This is the most up-to-date revision from the repository—the
- file as it is including Wolfman's changes.
- The third temporary file, <code>moons.txt.mine</code>,
- is the file as it was in Dracula's working copy before he did the Subversion update.
- </p>
-
- <p>
- Subversion creates these auxiliary files primarily
- to help people merge conflicts in binary files.
- It wouldn't make sense to insert <code><<<<<<<</code>
- and <code>>>>>>>></code> characters into an image file
- (it would almost certainly result in a corrupted image).
- The <code>svn resolved</code> command deletes these three extra files
- as well as telling Subversion that the conflict has been taken care of.
- </p>
-
- </div>
+ <div class="box">
+ <h3>Auxiliary Files</h3>
- <p>
- Some power users prefer to work with interpolated conflict markers directly,
- but for the rest of us,
- there are several tools for displaying differences and helping to merge them,
- including <a href="http://diffuse.sourceforge.net/">Diffuse</a> and <a href="http://winmerge.org/">WinMerge</a>.
- If Dracula launches Diffuse,
- it displays his file,
- the common base that he and Wolfman were working from,
- and Wolfman's file in a three-pane view
- (<a href="#f:diff_viewer">Figure XXX</a>):
- </p>
+ <p>
+ When Dracula did his update and Subversion detected the conflict in <code>moons.txt</code>,
+ it created three temporary files to help Dracula resolve it.
+ The first is called <code>moons.txt.r9</code>;
+ it is the file as it was in Dracula's local copy
+ before he started making changes,
+ i.e., the common ancestor for his work
+ and whatever he is in conflict with.
+ </p>
- <figure id="f:diff_viewer">
- <img src="svn/diff_viewer.png" alt="A Difference Viewer" />
- </figure>
+ <p>
+ The second file is <code>moons.txt.r11</code>.
+ This is the most up-to-date revision from the repository—the
+ file as it is including Wolfman's changes.
+ The third temporary file, <code>moons.txt.mine</code>,
+ is the file as it was in Dracula's working copy before he did the Subversion update.
+ </p>
- <p class="continue">
- Dracula can use the buttons to merge changes from either of the edited versions
- into the common ancestor,
- or edit the central pane directly.
- Again,
- once he is done,
- he uses <code>svn resolved</code> and <code>svn commit</code>
- to create revision 12 of the repository.
- </p>
+ <p>
+ Subversion creates these auxiliary files primarily
+ to help people merge conflicts in binary files.
+ It wouldn't make sense to insert <code><<<<<<<</code>
+ and <code>>>>>>>></code> characters into an image file
+ (it would almost certainly result in a corrupted image).
+ The <code>svn resolved</code> command deletes these three extra files
+ as well as telling Subversion that the conflict has been taken care of.
+ </p>
- <p>
- In this case, the conflict was small and easy to fix.
- However, if two or more people on a team are repeatedly creating conflicts for one another,
- it's usually a signal of deeper communication problems:
- either they aren't talking as often as they should, or their responsibilities overlap.
- If used properly,
- the version control system can help the team find and fix these issues
- so that it will be more productive in future.
- </p>
+ </div>
+
+ <p>
+ Some power users prefer to work with interpolated conflict markers directly,
+ but for the rest of us,
+ there are several tools for displaying differences and helping to merge them,
+ including <a href="http://diffuse.sourceforge.net/">Diffuse</a> and <a href="http://winmerge.org/">WinMerge</a>.
+ If Dracula launches Diffuse,
+ it displays his file,
+ the common base that he and Wolfman were working from,
+ and Wolfman's file in a three-pane view
+ (<a href="#f:diff_viewer">Figure 12</a>):
+ </p>
- <div class="box">
+ <figure id="f:diff_viewer">
+ <img src="svn/diff_viewer.png" alt="A Difference Viewer" />
+ <figcaption>Figure 12: A Difference Viewer</figcaption>
+ </figure>
+
+ <p class="continue">
+ Dracula can use the buttons to merge changes from either of the edited versions
+ into the common ancestor,
+ or edit the central pane directly.
+ Again,
+ once he is done,
+ he uses <code>svn resolved</code> and <code>svn commit</code>
+ to create revision 12 of the repository.
+ </p>
- <h3>Working With Multiple Files</h3>
+ <p>
+ In this case, the conflict was small and easy to fix.
+ However, if two or more people on a team are repeatedly creating conflicts for one another,
+ it's usually a signal of deeper communication problems:
+ either they aren't talking as often as they should, or their responsibilities overlap.
+ If used properly,
+ the version control system can help the team find and fix these issues
+ so that it will be more productive in future.
+ </p>
- <p>
- As mentioned <a href="#a:transaction">earlier</a>,
- every logical change to a project should result in a single commit,
- and every commit should represent one logical change.
- This is especially true when resolving conflicts:
- the work done to reconcile one person's changes with another are often complicated,
- so it should be a single entry in the project's history,
- with other, later, changes coming after it.
- </p>
+ <div class="box">
+ <h3>Working With Multiple Files</h3>
- </div>
+ <p>
+ As mentioned <a href="#a:transaction">earlier</a>,
+ every logical change to a project should result in a single commit,
+ and every commit should represent one logical change.
+ This is especially true when resolving conflicts:
+ the work done to reconcile one person's changes with another are often complicated,
+ so it should be a single entry in the project's history,
+ with other, later, changes coming after it.
+ </p>
- <div class="keypoints" id="k:merge">
- <h3>Summary</h3>
- <ul>
- <li>Conflicts must be resolved before a commit can be completed.</li>
- <li>Subversion puts markers in text files to show regions of conflict.</li>
- <li>For each conflicted file, Subversion creates auxiliary files containing the common parent, the master version, and the local version.</li>
- <li><code>svn resolve <em>files</em></code> tells Subversion that conflicts have been resolved.</li>
- </ul>
- </div>
+ </div>
- </section>
+ <div class="keypoints">
+ <h3>Summary</h3>
+ <ul>
+ <li>Conflicts must be resolved before a commit can be completed.</li>
+ <li>Subversion puts markers in text files to show regions of conflict.</li>
+ <li>For each conflicted file, Subversion creates auxiliary files containing the common parent, the master version, and the local version.</li>
+ <li><code>svn resolve <em>files</em></code> tells Subversion that conflicts have been resolved.</li>
+ </ul>
+ </div>
- <section id="s:rollback">
+ <div class="challenges">
+ <h3>Challenges</h3>
- <h2>Recovering Old Versions</h2>
+ <p>
+ If you are working in a group,
+ partner with someone who has also wrote a biography for themselves
+ for the previous section's challenges.
+ </p>
- <div class="understand" id="u:rollback">
- <h3>Understand:</h3>
- <ul>
- <li>How to undo changes to a working copy.</li>
- <li>How to recover old versions of files.</li>
- <li>What a branch is.</li>
- </ul>
- </div>
+ <ol>
+ <li>
+ Both partners use <code>svn update</code>
+ to make sure their working copies are up to date
+ and that there are no local changes.
+ </li>
+ <li>
+ The first partner edits her biography and commits the changes.
+ </li>
+ <li>
+ The second partner edits her copy of the file
+ (<em>without</em> having updated to get the first partner's changes),
+ then tries to <code>svn commit</code>.
+ </li>
+ <li>
+ Once the second partner has resolved the conflict,
+ she commits her changes.
+ </li>
+ <li>
+ Repeat these four steps with roles reversed.
+ </li>
+ </ol>
- <p>
- Now that we have seen how to merge files and resolve conflicts,
- we can look at how to use version control as an "infinite undo".
- Suppose that when Wolfman starts work late one night,
- his copy of <code>monsters</code> is in sync with the head at revision 12.
- He decides to edit the file <code>moons.txt</code>;
- unfortunately, he forgot that there was a full moon,
- so his changes don't make a lot of sense:
- </p>
+ <p>
+ If you are working on your own,
+ you can simulate the steps above
+ by checking out a second copy of the project into a new directory.
+ (Remember,
+ this cannot overlap any existing checked-out copies.)
+ Edit your biography in one copy and commit those changes,
+ then switch to the other copy and edit the same file
+ before updating.
+ </p>
+ </div>
+
+</section>
+
+<section id="s:rollback">
+ <h2>Recovering Old Versions</h2>
+
+ <div class="understand">
+ <h3>Learning Objectives</h3>
+ <ul>
+ <li>Discard changes made to a working copy.</li>
+ <li>Recover an old version of a file.</li>
+ <li>Explain what branches are and when they are used.</li>
+ </ul>
+ </div>
+
+ <p>
+ Now that we have seen how to merge files and resolve conflicts,
+ we can look at how to use version control as an "infinite undo".
+ Suppose that when Wolfman starts work late one night,
+ his copy of <code>explore</code> is in sync with the head at revision 12.
+ He decides to edit the file <code>moons.txt</code>;
+ unfortunately, he forgot that there was a full moon,
+ so his changes don't make a lot of sense:
+ </p>
<pre src="svn/poetry.txt">
Just one moon can make me growl
...
</pre>
- <p>
- When he's back in human form the next day,
- he wants to undo his changes.
- Without version control, his choices would be grim:
- he could try to edit them back into their original state by hand
- (which for some reason hardly ever seems to work),
- or ask his colleagues to send him their copies of the files
- (which is almost as embarrassing as chasing the neighbor's cat when in wolf form).
- </p>
+ <p>
+ When he's back in human form the next day,
+ he wants to undo his changes.
+ Without version control, his choices would be grim:
+ he could try to edit them back into their original state by hand
+ (which for some reason hardly ever seems to work),
+ or ask his colleagues to send him their copies of the files
+ (which is almost as embarrassing as chasing the neighbor's cat when in wolf form).
+ </p>
- <p>
- Since he's using Subversion, though,
- and hasn't committed his work to the repository,
- all he has to do is <a href="glossary.html#revert">revert</a> his local changes.
- <code>svn revert</code> simply throws away local changes to files
- and puts things back the way they were before those changes were made.
- This is a purely local operation:
- since Subversion stores the history of the project inside every working copy,
- Wolfman doesn't need to be connected to the network to do this.
- </p>
+ <p>
+ Since he's using Subversion, though,
+ and hasn't committed his work to the repository,
+ all he has to do is <a href="glossary.html#revert">revert</a> his local changes.
+ <code>svn revert</code> simply throws away local changes to files
+ and puts things back the way they were before those changes were made.
+ This is a purely local operation:
+ since Subversion stores the history of the project inside every working copy,
+ Wolfman doesn't need to be connected to the network to do this.
+ </p>
- <p>
- To start,
- Wolfman uses <code>svn diff</code> <em>without</em> the <code>-r HEAD</code> flag
- to take a look at the differences between his file
- and the master copy in the repository.
- Since he doesn't want to keep his changes,
- his next command is <code>svn revert moons.txt</code>.
- </p>
+ <p>
+ To start,
+ Wolfman uses <code>svn diff</code> <em>without</em> the <code>-r HEAD</code> flag
+ to take a look at the differences between his file
+ and the master copy in the repository.
+ Since he doesn't want to keep his changes,
+ his next command is <code>svn revert moons.txt</code>.
+ </p>
<pre>
$ <span class="in">cd jupiter</span>
<span class="out">Reverted moons.txt</span>
</pre>
- <p>
- What if someone <em>has</em> committed their changes,
- but still wants to undo them?
- For example,
- suppose Dracula decides that the numbers in <code>moons.txt</code> would look better with commas.
- He edits the file to put them in:
- </p>
+ <p>
+ What if someone <em>has</em> committed their changes,
+ but still wants to undo them?
+ For example,
+ suppose Dracula decides that the numbers in <code>moons.txt</code> would look better with commas.
+ He edits the file to put them in:
+ </p>
<pre src="svn/moons_commas.txt">
Name Orbital Radius Orbital Period Mass Radius
Elara 11<span class="highlight">,</span>740 259.6528 0.008 40.0
</pre>
- <p class="continue">
- then commits his changes to create revision 13.
- A little while later,
- the Mummy sees the change and orders Dracula to put things back the way they were.
- What should Dracula do?
- </p>
+ <p class="continue">
+ then commits his changes to create revision 13.
+ A little while later,
+ the Mummy sees the change and orders Dracula to put things back the way they were.
+ What should Dracula do?
+ </p>
- <p>
- We can draw the sequence of events leading up to revision 13
- as shown in <a href="#f:before_undoing">Fixture XXX</a>:
- </p>
+ <p>
+ We can draw the sequence of events leading up to revision 13
+ as shown in <a href="#f:before_undoing">Figure 13</a>:
+ </p>
- <figure id="f:before_undoing">
- <img src="svn/before_undoing.png" alt="Before Undoing" />
- </figure>
+ <figure id="f:before_undoing">
+ <img src="svn/before_undoing.png" alt="Before Undoing" />
+ <figcaption>Figure 13: Before Undoing</figcaption>
+ </figure>
- <p class="continue">
- Dracula wants to erase revision 13 from the repository,
- but he can't actually do that:
- once a change is in the repository,
- it's there forever.
- What he can do instead is merge the old revision with the current revision
- to create a new revision
- (<a href="#f:merging_history">Fixture XXX</a>).
- </p>
+ <p class="continue">
+ Dracula wants to erase revision 13 from the repository,
+ but he can't actually do that:
+ once a change is in the repository,
+ it's there forever.
+ What he can do instead is merge the old revision with the current revision
+ to create a new revision
+ (<a href="#f:merging_history">Figure 14</a>).
+ </p>
- <figure id="f:merging_history">
- <img src="svn/merging_history.png" alt="Merging History" />
- </figure>
+ <figure id="f:merging_history">
+ <img src="svn/merging_history.png" alt="Merging History" />
+ <figcaption>Figure 14: Merging History</figcaption>
+ </figure>
- <p class="continue">
- This is exactly like merging changes made by two different people;
- the only difference is that the "other person" is his past self.
- </p>
+ <p class="continue">
+ This is exactly like merging changes made by two different people;
+ the only difference is that the "other person" is his past self.
+ </p>
- <p>
- To undo his commas,
- Dracula must merge revision 12 (the one before his change)
- with revision 13 (the current head revision)
- using <code>svn merge</code>:
- </p>
+ <p>
+ To undo his commas,
+ Dracula must merge revision 12 (the one before his change)
+ with revision 13 (the current head revision)
+ using <code>svn merge</code>:
+ </p>
<pre>
$ <span class="in">svn merge -r HEAD:12 moons.txt</span>
U moons.txt</span>
</pre>
- <p class="continue">
- The <code>-r</code> flag specifies the range of revisions to merge:
- to undo the changes from revision 12 to revision 13,
- he uses either <code>13:12</code> or <code>HEAD:12</code>
- (since he is going backward in time from the most recent revision to revision 12).
- This is called a <a href="glossary.html#reverse-merge">reverse</a> merge
- because he's going backward in time.
- </p>
-
- <p>
- After he runs this command,
- he must run <code>svn commit</code> to save the changes to the repository.
- This creates a new revision, number 14,
- rather than erasing revision 13.
- That way,
- the changes he made to create revision 13 are still there
- if he can ever convince the Mummy that numbers should have commas.
- </p>
+ <p class="continue">
+ The <code>-r</code> flag specifies the range of revisions to merge:
+ to undo the changes from revision 12 to revision 13,
+ he uses either <code>13:12</code> or <code>HEAD:12</code>
+ (since he is going backward in time from the most recent revision to revision 12).
+ This is called a <a href="glossary.html#reverse-merge">reverse</a> merge
+ because he's going backward in time.
+ </p>
- <p>
- Merging can be used to recover older revisions of files,
- not just the most recent,
- and to recover many files or directories at a time.
- The most frequent use, though,
- is to manage parallel streams of development in large projects.
- This is outside the scope of this chapter,
- but the basic idea is simple.
- </p>
+ <p>
+ After he runs this command,
+ he must run <code>svn commit</code> to save the changes to the repository.
+ This creates a new revision, number 14,
+ rather than erasing revision 13.
+ That way,
+ the changes he made to create revision 13 are still there
+ if he can ever convince the Mummy that numbers should have commas.
+ </p>
- <p>
- Suppose that Universal Monsters has just released a new program for designing secret lairs.
- Dracula and Wolfman are supposed to start adding a few features
- that had to be left out of the first release because time ran short.
- At the same time,
- Frankenstein and the Mummy are doing technical support:
- their job is to fix any bugs that users find.
- All sorts of things could go wrong if both teams tried to work on the same code at the same time.
- For example,
- if Frankenstein fixed a bug and sent a new copy of the program to a user in Greenland,
- it would be all too easy for him to accidentally include
- the half-completed shark tank control feature that Wolfman was working on.
- </p>
+ <div class="box">
+ <h3>Another Way to Do It</h3>
- <p>
- The usual way to handle this situation is
- to create a <a href="glossary.html#branch">branch</a>
- in the repository for each major sub-project
- (<a href="#f:branch_merge">Figure XXX</a>).
- While Wolfman and Dracula work on
- the <a href="glossary.html#main-line">main line</a>,
- Frankenstein and the Mummy create a branch,
- which is just another copy of the repository's files and directories
- that is also under version control.
- They can work in their branch without disturbing Wolfman and Dracula and vice versa:
- </p>
+ <p>
+ Another way to recover a particular version of a particular file
+ is to use the <code>svn copy</code> command.
+ If the URL of our repository is
+ <code>https://universal.software-carpentry.org/explore</code>,
+ then the command:
+ </p>
- <figure id="f:branch_merge">
- <img src="svn/branch_merge.png" alt="Branching and Merging" />
- </figure>
+<pre>
+$ <span class="in">svn copy https://universal.software-carpentry.org/explore/mission.txt@120 ./mission.txt</span>
+</pre>
- <p>
- Branches in version control repositories are often described as "parallel universes".
- Each branch starts off as a clone of the project at some moment in time
- (typically each time the software is released,
- or whenever work starts on a major new feature).
- Changes made to a branch only affect that branch,
- just as changes made to the files in one directory don't affect files in other directories.
- However,
- the branch and the main line are both stored in the same repository,
- so their revision numbers are always in step.
- </p>
+ <p class="continue">
+ copies the file <code>mission.txt</code> as it was in revision 120
+ into our working directory
+ (overwriting whatever <code>mission.txt</code> file we currently have,
+ if any).
+ What's more,
+ using <code>svn copy</code> brings along the file's history as well,
+ so that future <code>svn log</code> operations will show
+ how <code>mission.txt</code> was resurrected.
+ </p>
+ </div>
- <p>
- If someone decides that a bug fix in one branch should also be made in another,
- all they have to do is merge the files in question.
- This is exactly like merging an old version of a file with the current one,
- but instead of going backward in time,
- the change is brought sideways from one branch to another.
- </p>
+ <p>
+ Merging can be used to recover older revisions of files,
+ not just the most recent,
+ and to recover many files or directories at a time.
+ The most frequent use, though,
+ is to manage parallel streams of development in large projects.
+ This is outside the scope of this chapter,
+ but the basic idea is simple.
+ </p>
- <p>
- Branching helps projects scale up by letting sub-teams work independently,
- but too many branches can cause as many problems as they solve.
- Karl Fogel's excellent book
- <a href="bib.html#fogel-producing-oss"><cite>Producing Open Source Software</cite></a>,
- and Laura Wingerd and Christopher Seiwald's paper
- "<a href="bib.html#wingerd-seiwald-scm">High-level Best Practices in Software Configuration Management</a>",
- talk about branches in much more detail.
- Projects usually don't need to do this until they have a dozen or more developers,
- or until several versions of their software are in simultaneous use,
- but using branches is a key part of switching from software carpentry to software engineering.
- </p>
+ <p>
+ Suppose that Universal Missions has just released a new program
+ for designing interplanetary voyages.
+ Dracula and Wolfman are supposed to add some features
+ that were left out of the first release because time ran short.
+ At the same time,
+ Frankenstein and the Mummy are doing technical support:
+ their job is to fix any bugs that users find.
+ </p>
- <div class="keypoints" id="k:rollback">
- <h3>Summary</h3>
- <ul>
- <li>Old versions of files can be recovered by merging their old state with their current state.</li>
- <li>Recovering an old version of a file does not erase the intervening changes.</li>
- <li>Use branches to support parallel independent development.</li>
- <li><code>svn merge</code> merges two revisions of a file.</li>
- <li><code>svn revert</code> undoes local changes to files.</li>
- </ul>
- </div>
+ <p>
+ All sorts of things could go wrong
+ if both teams tried to work on the same code at the same time.
+ In particular,
+ Dracula and Wolfman might want to make large changes
+ to the structure of the code
+ in order to make it easier to add new features,
+ while Frankenstein and the Mummy want to make as few changes as possible
+ so as not to introduce new bugs while fixing old ones.
+ </p>
- </section>
+ <p>
+ The usual way to handle this situation is
+ to create a <a href="glossary.html#branch">branch</a>
+ in the repository for each major sub-project
+ (<a href="#f:branch_merge">Figure 15</a>).
+ While Wolfman and Dracula work on
+ the <a href="glossary.html#main-line">main line</a>,
+ Frankenstein and the Mummy create a branch,
+ which is just another copy of the repository's files and directories
+ that is also under version control.
+ They can work in their branch without disturbing Wolfman and Dracula and vice versa:
+ </p>
- <section id="s:setup">
+ <figure id="f:branch_merge">
+ <img src="svn/branch_merge.png" alt="Branching and Merging" />
+ <figcaption>Figure 15: Branching and Merging</figcaption>
+ </figure>
- <h2>Setting up a Repository</h2>
+ <p>
+ Branches in version control repositories are often described as "parallel universes".
+ Each branch starts off as a clone of the project at some moment in time
+ (typically each time the software is released,
+ or whenever work starts on a major new feature).
+ Changes made to a branch only affect that branch,
+ just as changes made to the files in one directory don't affect files in other directories.
+ However,
+ the branch and the main line are both stored in the same repository,
+ so their revision numbers are always in step.
+ </p>
- <div class="understand" id="u:setup">
- <h3>Understand:</h3>
- <ul>
- <li>How to create a repository.</li>
- </ul>
- </div>
+ <p>
+ If someone decides that a bug fix in one branch should also be made in another,
+ all they have to do is merge the files in question.
+ This is exactly like merging an old version of a file with the current one,
+ but instead of going backward in time,
+ the change is brought sideways from one branch to another.
+ </p>
- <p>
- It is finally time to see how to create a repository.
- As a quick recap,
- we will keep the master copy of our work in a repository
- on a server that we can access from other machines on the internet.
- That master copy consists of files and directories that no-one ever edits directly.
- Instead, a copy of Subversion running on that machine
- manages updates for us and watches for conflicts.
- Our working copy is a mirror image of the master sitting on our computer.
- When our Subversion client needs to communicate with the master,
- it exchanges data with the copy of Subversion running on the server.
- </p>
+ <p>
+ Branching helps projects scale up by letting sub-teams work independently,
+ but too many branches can cause as many problems as they solve.
+ Karl Fogel's excellent book
+ <a href="bib.html#fogel-producing-oss"><cite>Producing Open Source Software</cite></a>,
+ and Laura Wingerd and Christopher Seiwald's paper
+ "<a href="bib.html#wingerd-seiwald-scm">High-level Best Practices in Software Configuration Management</a>",
+ talk about branches in much more detail.
+ Projects usually don't need to do this until they have a dozen or more developers,
+ or until several versions of their software are in simultaneous use,
+ but using branches is a key part of switching from software carpentry to software engineering.
+ </p>
- <figure id="f:repo_four_things">
- <img src="svn/repo_four_things.png" alt="What's Needed for a Repository" />
- </figure>
+ <div class="keypoints">
+ <h3>Summary</h3>
+ <ul>
+ <li>Old versions of files can be recovered by merging their old state with their current state.</li>
+ <li>Recovering an old version of a file does not erase the intervening changes.</li>
+ <li>Use branches to support parallel independent development.</li>
+ <li><code>svn revert</code> undoes local changes to files.</li>
+ <li><code>svn merge</code> merges two revisions of a file.</li>
+ </ul>
+ </div>
- <p>
- To make this to work, we need four things
- (<a href="#f:repo_four_things">Figure XXX</a>):
- </p>
+ <div class="challenges">
+ <h3>Challenges</h3>
- <ol>
-
- <li>
- The repository itself.
- It's not enough to create an empty directory and start filling it with files:
- Subversion needs to create a lot of other structure
- in order to keep track of old revisions, who made what changes, and so on.
- </li>
-
- <li>
- The full URL of the repository.
- This includes the URL of the server
- and the path to the repository on that machine.
- (The second part is needed because a single server can,
- and usually will,
- host many repositories.)
- </li>
-
- <li>
- Permission to read or write the master copy.
- Many open source projects give the whole world permission to read from their repository,
- but very few allow strangers to write to it:
- there are just too many possibilities for abuse.
- Somehow, we have to set up a password or something like it
- so that users can prove who they are.
- </li>
-
- <li>
- A working copy of the repository on our computer.
- Once the first three things are in place,
- this just means running the <code>checkout</code> command.
- </li>
-
- </ol>
+ <ol>
+ <li>
+ Explain what the command:
+<pre>
+svn diff -r 240:261 fish.dat
+</pre>
+ does, and when you might want to run it.
+ </li>
- <p>
- To keep things simple,
- we will start by creating a repository on the machine that we're working on.
- This won't let us share our work with other people,
- but it <em>will</em> allow us to save the history of our work as we go along.
- </p>
+ <li>
+ Suppose that a file called <code>mission.txt</code>
+ existed in revision 90 of a repository,
+ but had been deleted in revision 91.
+ What two commands could we use to recover it?
+ </li>
- <p>
- The command to create a repository is <code>svnadmin create</code>,
- followed by the path to the repository.
- If we want to create a repository called <code>lair_repo</code>
- directly under our home directory,
- we just <code>cd</code> to get home
- and run <code>svnadmin create lair_repo</code>.
- This command creates a directory called <code>lair_repo</code> to hold our repository,
- and fills it with various files that Subversion uses
- to keep track of the project's history:
- </p>
+ </ol>
+ </div>
+
+</section>
+
+<section id="s:setup">
+ <h2>Setting up a Repository</h2>
+
+ <div class="understand">
+ <h3>Learning Objectives</h3>
+ <ul>
+ <li>How to create a repository.</li>
+ </ul>
+ </div>
+
+ <p>
+ It is finally time to see how to create a repository.
+ As a quick recap,
+ we will keep the master copy of our work in a repository
+ on a server that we can access from other machines on the internet.
+ That master copy consists of files and directories that no-one ever edits directly.
+ Instead, a copy of Subversion running on that machine
+ manages updates for us and watches for conflicts.
+ Our working copy is a mirror image of the master sitting on our computer.
+ When our Subversion client needs to communicate with the master,
+ it exchanges data with the copy of Subversion running on the server.
+ </p>
+
+ <p>
+ To make this to work, we need four things:
+ </p>
+
+ <ol>
+
+ <li>
+ The repository itself.
+ It's not enough to create an empty directory and start filling it with files:
+ Subversion needs to create a lot of other structure
+ in order to keep track of old revisions, who made what changes, and so on.
+ </li>
+
+ <li>
+ The full URL of the repository.
+ This includes the URL of the server
+ and the path to the repository on that machine.
+ (The second part is needed because a single server can,
+ and usually will,
+ host many repositories.)
+ </li>
+
+ <li>
+ Permission to read or write the master copy.
+ Many open source projects give the whole world permission to read from their repository,
+ but very few allow strangers to write to it:
+ there are just too many possibilities for abuse.
+ Somehow, we have to set up a password or something like it
+ so that users can prove who they are.
+ </li>
+
+ <li>
+ A working copy of the repository on our computer.
+ Once the first three things are in place,
+ this just means running the <code>checkout</code> command.
+ </li>
+
+ </ol>
+
+ <p>
+ To keep things simple,
+ we will start by creating a repository on the machine that we're working on.
+ This won't let us share our work with other people,
+ but it <em>will</em> allow us to save the history of our work as we go along.
+ </p>
+
+ <p>
+ The command to create a repository is <code>svnadmin create</code>,
+ followed by the path to the repository.
+ If we want to create a repository called <code>missions_repo</code>
+ directly under our home directory,
+ we just <code>cd</code> to get home
+ and run <code>svnadmin create missions_repo</code>.
+ This command creates a directory called <code>missions_repo</code> to hold our repository,
+ and fills it with various files that Subversion uses
+ to keep track of the project's history:
+ </p>
<pre>
$ <span class="in">cd</span>
-$ <span class="in">svnadmin create lair_repo</span>
-$ <span class="in">ls -F lair_repo</span>
+$ <span class="in">svnadmin create missions_repo</span>
+$ <span class="in">ls -F missions_repo</span>
<span class="out">README.txt conf/ db/ format hooks/ locks/</span>
</pre>
- <p class="continue">
- We should <em>never</em> edit anything in this repository directly.
- Doing so probably won't shred our sanity and leave us gibbering in mindless horror,
- but it will almost certainly make the repository unusable.
- </p>
+ <p class="continue">
+ We should <em>never</em> edit any of this directly,
+ since it will almost certainly make the repository unusable.
+ Instead,
+ we should use <code>svn checkout</code>
+ to get a working copy of this repository.
+ If our home directory is <code>/users/mummy</code>,
+ then the full path to the repository we just created is <code>/users/mummy/missions_repo</code>,
+ so we run <code>svn checkout file:///users/mummy/missions missions_working</code>.
+ </p>
- <p>
- To get a working copy of this repository,
- we use Subversion's <code>checkout</code> command.
- If our home directory is <code>/users/mummy</code>,
- then the full path to the repository we just created is <code>/users/mummy/lair_repo</code>,
- so we run <code>svn checkout file:///users/mummy/lair lair_working</code>.
- </p>
+ <p>
+ Working backward,
+ the second argument,
+ <code>missions_working</code>,
+ specifies where the working copy is to be put.
+ The first argument is the URL of our repository,
+ and it has two parts.
+ <code>/users/mummy/missions_repo</code> is the path to repository directory.
+ <code>file://</code> specifies the <a href="glossary.html#protocol">protocol</a>
+ that Subversion will use to communicate with the repository—in this case,
+ it says that the repository is part of the local machine's filesystem.
+ (Notice that the protocol ends in two slashes,
+ while the absolute path to the repository starts with a slash,
+ making three in total.
+ A very common mistake is to type only two, since that's what web URLs normally have.)
+ </p>
- <p>
- Working backward,
- the second argument,
- <code>lair_working</code>,
- specifies where the working copy is to be put.
- The first argument is the URL of our repository,
- and it has two parts.
- <code>/users/mummy/lair_repo</code> is the path to repository directory.
- <code>file://</code> specifies the <a href="glossary.html#protocol">protocol</a>
- that Subversion will use to communicate with the repository—in this case,
- it says that the repository is part of the local machine's filesystem.
- Notice that the protocol ends in two slashes,
- while the absolute path to the repository starts with a slash,
- making three in total.
- A very common mistake is to type only two, since that's what web URLs normally have.
- </p>
+ <p>
+ When we're doing a checkout,
+ it is <em>very</em> important that we provide the second argument,
+ which specifies the name of the directory we want the working copy to be put in.
+ Without it,
+ Subversion will try to use the name of the repository,
+ <code>missions_repo</code>,
+ as the name of the working copy.
+ Since we're in the directory that contains the repository,
+ this means that Subversion will try to overwrite the repository with a working copy.
+ Again,
+ there isn't much risk of our sanity being torn to shreds,
+ but this could ruin our repository.
+ </p>
- <p>
- When we're doing a checkout,
- it is <em>very</em> important that we provide the second argument,
- which specifies the name of the directory we want the working copy to be put in.
- Without it,
- Subversion will try to use the name of the repository,
- <code>lair_repo</code>,
- as the name of the working copy.
- Since we're in the directory that contains the repository,
- this means that Subversion will try to overwrite the repository with a working copy.
- Again,
- there isn't much risk of our sanity being torn to shreds,
- but this could ruin our repository.
- </p>
+ <p>
+ To avoid this problem,
+ most people create a sub-directory in their account called something like <code>repos</code>,
+ and then create their repositories in that.
+ For example,
+ we could create our repository in <code>/users/mummy/repos/missions</code>,
+ then check out a working copy as <code>/users/mummy/missions</code>.
+ This practice makes both names easier to read.
+ </p>
- <p>
- To avoid this problem,
- most people create a sub-directory in their account called something like <code>repos</code>,
- and then create their repositories in that.
- For example,
- we could create our repository in <code>/users/mummy/repos/lair</code>,
- then check out a working copy as <code>/users/mummy/lair</code>.
- This practice makes both names easier to read.
- </p>
+ <p>
+ The obvious next step is to put our repository on a server,
+ rather than on our personal machine.
+ In fact,
+ we should <em>always</em> do this
+ so that we don't lose the history of our project
+ if our laptop is damaged or stolen.
+ A departmental server is also much more likely to be backed up regularly
+ than our personal machine…
+ </p>
- <p>
- The obvious next steps are
- to put our repository on a server,
- rather than on our personal machine,
- and to give other people access to the repository we have just created
- so that they can work with us.
- We'll discuss the first in <a href="web.html#s:svn">a later chapter</a>,
- but unfortunately,
- the second really does require things that we are not going to cover in this course.
- If you want to do this, you can:
- </p>
+ <p>
+ Creating a repository on a server is simple:
+ just log in and go through the steps described above.
+ Accessing that repository from another machine
+ is also straightforward.
+ If the machine's address is <code>serv.euphoric.edu</code>,
+ and our user ID is <code>dracula</code>,
+ the URL of the repository will be something like:
+ </p>
- <ul>
+<pre>
+svn+ssh://dracula@serv.euphoric.edu/home/dracula/repos/missions
+</pre>
- <li>
- ask your system administrator to set it up for you;
- </li>
+ <p>
+ Reading from left to right:
+ </p>
- <li>
- use an open source hosting service like <a href="http://www.sf.net">SourceForge</a>,
- <a href="http://code.google.com">Google Code</a>,
- <a href="https://github.com/">GitHub</a>,
- or <a href="https://bitbucket.org/">BitBucket</a>; or
- </li>
+ <ul>
+ <li>
+ <code>svn+ssh</code> is the protocol that Subversion uses to connect to the server
+ (in this case,
+ a combination of Subversion's own protocol
+ and <a href="shell.html#s:ssh">SSH</a>);
+ </li>
+ <li>
+ <code>dracula@serv.euphoric.edu</code> identifies the server and who we are
+ (just like an email address);
+ and
+ </li>
+ <li>
+ <code>/home/dracula/repos/missions</code> is the absolutely path of the repository
+ on the server.
+ </li>
+ </ul>
+
+ <p id="a:only_user">
+ That's fine if you are the only person using the repository,
+ but if you want to share it with others,
+ you need to worry about security.
+ As we discuss in the lesson on <a href="web.html">web programming</a>,
+ as soon as you provide a service on the internet,
+ there's the possibility that someone may try to attack your system through it.
+ Rather than trying to learn enough system administration skills
+ to set things up safely,
+ it is usually easier to:
+ </p>
- <li>
- spend a few dollars a month on a commercial hosting service like <a href="http://dreamhost.com">DreamHost</a>
- that provides web-based GUIs for creating and managing repositories.
- </li>
+ <ul>
- </ul>
+ <li>
+ ask your department's system administrator to set it up for you;
+ </li>
- <p>
- If you choose the second or third option,
- please check with whoever handles intellectual property at your institution
- to make sure that putting your work on a commercially-operated machine
- that is probably in some other legal jurisdiction
- isn't going to cause trouble.
- Many people assume that it's "just OK",
- while others act as if not having asked will be an acceptable defence later on.
- Unfortunately,
- neither is true…
- </p>
+ <li>
+ use a hosting service like <a href="http://www.sf.net">SourceForge</a>,
+ <a href="http://code.google.com">Google Code</a>,
+ <a href="https://github.com/">GitHub</a>,
+ or <a href="https://bitbucket.org/">BitBucket</a>; or
+ </li>
- <div class="keypoints" id="k:setup">
- <h3>Summary</h3>
- <ul>
- <li>Repositories can be hosted locally, on local (departmental) servers, on hosting services, or on their owners' own domains.</li>
- <li><code>svnadmin create <em>name</em></code> creates a new repository.</li>
- </ul>
- </div>
+ <li>
+ spend a few dollars a month on a commercial hosting service
+ that provides web-based GUIs for creating and managing repositories.
+ </li>
- </section>
+ </ul>
- <section id="s:provenance">
+ <p>
+ If you choose the second or third option,
+ please check with whoever handles intellectual property at your institution
+ to make sure that putting your work on a commercially-operated machine
+ that is probably in some other legal jurisdiction
+ isn't going to cause trouble.
+ Many people assume that it's "just OK",
+ while others act as if not having asked will be an acceptable defence later on.
+ Unfortunately,
+ neither is true…
+ </p>
- <h2>Provenance</h2>
+ <div class="keypoints">
+ <h3>Summary</h3>
+ <ul>
+ <li><code>svnadmin create <em>name</em></code> creates a new repository.</li>
+ <li>Repositories can be hosted locally, on local (departmental) servers, on hosting services, or on their owners' own domains.</li>
+ </ul>
+ </div>
+
+ <div class="challenges">
+ <h3>Challenges</h3>
+
+ <ol>
- <div class="understand" id="u:provenance">
- <h3>Understand:</h3>
+ <li>
+ Create a Subversion repository called <code>trials_repo</code>
+ in your home directory.
+ Check out a working copy in a directory called <code>trials_working</code>
+ (also in your home directory).
+ Add a couple of text files,
+ commit the changes,
+ and then use <code>svn info trials_working</code>
+ to see what Subversion tells you about your working copy.
+ </li>
+
+ <li>
+ We said <a href="#a:only_user">above</a> that
+ you might be the only person using a particular repository.
+ When and why is version control worth using
+ if no-one else is working on a project with you?
+ </li>
+
+ <li>
+ There are many ways to organize repositories.
+ Some of the most common are to create one repository for:
<ul>
- <li>What data provenance is.</li>
- <li>How to embed version numbers and other information in files managed by version control.</li>
- <li>How to record version information about a program in its output.</li>
+ <li>each person</li>
+ <li>each paper</li>
+ <li>all the work done on one grant</li>
+ <li>all the work done on one project</li>
+ <li>the entire lab (which is shared by everyone in the lab)</li>
+ <li>the entire department (typically with a top-level directory for each person or project in the department)</li>
</ul>
- </div>
+ What activities does each one make easy or hard?
+ Which of these would you prefer, and why?
+ </li>
- <p>
- In art,
- the <a href="glossary.html#provenance">provenance</a> of a work
- is the history of who owned it, when, and where.
- In science,
- it's the record of how a particular result came to be:
- what raw data was processed by what version of what program to create which intermediate files,
- what was used to turn those files into which figures of which papers,
- and so on.
- </p>
+ </ol>
+ </div>
- <p>
- One of the central ideas of this course is that
- wen can automatically track the provenance of scientific data.
- To start,
- suppose we have a text file <code>combustion.dat</code> in a Subversion repository.
- Run the following two commands:
- </p>
+</section>
+
+<section id="s:provenance">
+ <h2>Provenance</h2>
+
+ <div class="understand">
+ <h3>Understand:</h3>
+ <ul>
+ <li>What data provenance is.</li>
+ <li>How to embed version numbers and other information in files managed by version control.</li>
+ <li>How to record version information about a program in its output.</li>
+ </ul>
+ </div>
+
+ <p>
+ In art,
+ the <a href="glossary.html#provenance">provenance</a> of a work
+ is the history of who owned it, when, and where.
+ In science,
+ it's the record of how a particular result came to be:
+ what raw data was processed by what version of what program to create which intermediate files,
+ what was used to turn those files into which figures of which papers,
+ and so on.
+ </p>
+
+ <p>
+ One of the big benefits of using version control is that
+ it lets us track the provenance of scientific data automatically.
+ To start,
+ suppose we have a text file <code>combustion.dat</code> in a Subversion repository.
+ Run the following two commands:
+ </p>
<pre>
$ svn propset svn:keywords Revision combustion.dat
$ svn commit -m "Turning on the 'Revision' keyword" combustion.dat
</pre>
- <p>
- Now open the file in an editor
- and add the following line somewhere near the top:
- </p>
+ <p class="continue">
+ This does nothing by itself,
+ but now open the file in an editor
+ and add the following line somewhere near the top:
+ </p>
<pre>
-# $Revision:$
+$Revision:$
</pre>
- <p>
- The '#' sign isn't important:
- it's just what <code>.dat</code> files use to show comments.
- The <code>$Revision:$</code> string,
- on the other hand,
- means something special to Subversion.
- Save the file, and commit the change:
- </p>
+ <p>
+ The <code>$Revision:$</code> string means something special to Subversion.
+ Save the file, and commit the change:
+ </p>
<pre>
$ svn commit -m "Inserting the 'Revision' keyword" combustion.dat
</pre>
- <p>
- When we open the file again,
- we'll see that Subversion has changed that line to something like:
- </p>
+ <p>
+ When we open the file again,
+ we'll see that Subversion has changed that line to something like:
+ </p>
<pre>
-# $Revision: 143$
+$Revision: 143$
</pre>
- <p class="continue">
- i.e., Subversion has inserted the version number
- after the colon and before the closing <code>$</code>.
- </p>
+ <p class="continue">
+ i.e., it has inserted the version number
+ after the colon and before the closing <code>$</code>.
+ If we edit the file again—e.g., add a couple of lines with random numbers—and
+ commit once more,
+ the line is updated again to:
+ </p>
- <p>
- Here's what just happened.
- First, Subversion allows you to set
- <a href="glossary.html#property-subversion">properties</a>
- for files and and directories.
- These properties aren't in the files or directories themselves,
- but live in Subversion's database.
- One of those properties,
- <code>svn:keywords</code>,
- tells Subversion to look in files that are being changed
- for strings of the form <code>$propertyname: …$</code>,
- where <code>propertyname</code> is a string like <code>Revision</code> or <code>Author</code>.
- (About half a dozen such strings are supported.)
- </p>
+<pre>
+$Revision: 144$
+</pre>
- <p>
- If it sees such a string,
- Subversion rewrites it as the commit is taking place to replace <code>…</code>
- with the current version number,
- the name of the person making the change,
- or whatever else the property's name tells it to do.
- You only have to add the string to the file once;
- after that,
- Subversion updates it for you every time the file changes.
- </p>
+ <p>
+ Here's what just happened.
+ First, Subversion allows uss to add
+ <a href="glossary.html#property-subversion">properties</a>
+ to files and and directories.
+ These properties aren't stored in the files or directories themselves,
+ but in Subversion's database.
+ One of those properties,
+ <code>svn:keywords</code>,
+ tells Subversion to look in files that are being changed
+ for strings of the form <code>$propertyname: …$</code>,
+ where <code>propertyname</code> is a string like <code>Revision</code> or <code>Author</code>.
+ (About half a dozen such strings are supported.)
+ </p>
- <p>
- Putting the version number in the file this way can be pretty handy.
- If you copy the file to another machine,
- for example,
- it carries its version number with it,
- so you can tell which version you have even if it's outside version control.
- We'll see some more useful things we can do with this information in
- <a href="python.html">the next chapter</a>.
- </p>
+ <p>
+ If it sees such a string,
+ Subversion rewrites it as the commit is taking place to replace <code>…</code>
+ with the current version number,
+ the name of the person making the change,
+ or whatever else the property's name tells it to do.
+ We only have to add the string to the file once;
+ after that,
+ Subversion updates it for you every time the file changes.
+ </p>
- <div class="box">
-
- <h3>When <em>Not</em> to Use Version Control</h3>
-
- <p>
- Despite the rapidly decreasing cost of storage,
- it is still possible to run out of disk space.
- In some labs,
- people can easy go through 2 TB/month if they're not careful.
- Since version control tools usually store revisions in terms of lines,
- with binary data files,
- they end up essentially storing every revision separately.
- This isn't that bad
- (it's what we'd be doing anyway),
- but it means version control isn't doing what it likes to do,
- and the repository can get very large very quickly.
- Another concern is that if very old data will no longer be used,
- it can be nice to archive or delete old data files.
- This is not possible if our data is version controlled:
- information can only be added to a repository,
- so it can only ever increase in size.
- </p>
-
- </div>
+ <p>
+ Putting the version number in the file this way can be pretty handy.
+ If you copy the file to another machine,
+ for example,
+ it carries its version number with it,
+ so you can tell which version you have even if it's outside version control.
+ We'll see some more useful things we can do with this information <a href="python.html">later</a>.
+ </p>
- <p>
- We can use this trick with shell scripts too,
- or with almost any other kind of program.
- Going back to Nelle Nemo's data processing from the previous chapter,
- for example,
- suppose she writes a shell script that uses <code>gooclean</code>
- to tidy up data files.
- Her first version looks like this:
- </p>
+ <p>
+ We can use this trick with shell scripts too,
+ or with almost any other kind of program.
+ Let's go back to Nelle Nemo's data processing from
+ the lesson on the <a href="shell.html">shell</a>.
+ Suppose she writes a shell script called <code>gooclean</code>
+ to tidy up data files.
+ Her first version looks like this:
+ </p>
<pre>
-for filename in $*
-do
- gooclean -b 0 100 < $filename > cleaned-$filename
-done
+# gooclean: clean up a single data file
+goonorm -b 0 100 < $1 | goofilter -x --enlarge 2.0 > cleaned-$1
</pre>
- <p class="continue">
- i.e., it runs <code>gooclean</code> with bounding values of 0 and 100
- for each specified file,
- putting the result in a temporary file with a well-defined name.
- Assuming that '#' is the comment character for those kinds of data files,
- she could instead write:
- </p>
+ <p class="continue">
+ i.e.,
+ it runs <code>goonorm</code> and then <code>goofilter</code> with some fixed parameters
+ and creates an output file called <code>cleaned-something.dat</code>
+ (if the input file's name was <code>something.dat</code>).
+ Assuming that '#' is the comment character for her output files,
+ she could instead write:
+ </p>
<pre>
-for filename in $*
-do
- <span class="highlight">echo "gooclean $Revision: 901$ -b 0 100" > $filename</span>
- gooclean -b 0 100 < $filename <span class="highlight">>></span> cleaned-$filename
-done
+# gooclean: clean up a single data file
+<span class="highlight">echo "# gooclean $Revision:$" > cleaned-$1</span>
+goonorm -b 0 100 < $1 | goofilter -x --enlarge 2.0 <span class="highlight">>></span> cleaned-$1
</pre>
- <p>
- The first change puts a line in the output file
- that describes how that file was created.
- The second change is to use <code>>></code> instead of <code>></code>
- to redirect <code>gooclean</code>'s output to the file.
- <code>>></code> means "append to":
- instead of overwriting whatever is in the file,
- it adds more content to it.
- This ensures that the first line of the file is the provenance record,
- with the actual output of <code>gooclean</code> after it.
- </p>
+ <p class="continue">
+ then set the <code>svn:keywords</code> property
+ and commit the file to insert the revision number,
+ making it:
+ </p>
- <div class="keypoints" id="k:provenance">
- <h3>Summary</h3>
- <ul>
- <li><code>$Keyword:$</code> in a file can be filled in with a property value each time the file is committed.</li>
- <li idea="paranoia">Put version numbers in programs' output to establish provenance for data.</li>
- <li><code>svn propset svn:keywords <em>property</em> <em>files</em></code> tells Subversion to start filling in property values.</li>
- </ul>
- </div>
+<pre>
+# gooclean: clean up a single data file
+<span class="highlight">echo "# gooclean $Revision: 487$" > cleaned-$1</span>
+goonorm -b 0 100 < $1 | goofilter -x --enlarge 2.0 <span class="highlight">>></span> cleaned-$1
+</pre>
+
+ <p>
+ Now,
+ each time this script is run it will:
+ </p>
+
+ <ul>
+ <li>
+ put the line
+<pre>
+# gooclean $Revision: 487$
+</pre>
+ in the output file,
+ then
+ </li>
+ <li>
+ append whatever the pipline containing <code>goonorm</code> and <code>goofilter</code>
+ would have put in the file originally.
+ (The double redirection <code>>></code> means "append to" rather than "overwrite".)
+ </li>
+ </ul>
- </section>
+ <p class="continue">
+ In other words,
+ the output of this shell script will always record
+ exactly what version of the script produced it.
+ This isn't enough to reproduce the output—we would need to record
+ the version numbers of the input files and the <code>goonorm</code> and <code>goofilter</code> programs,
+ and the values of the parameters those programs used
+ in order to do that—but it's an important and useful first step.
+ </p>
- <section id="s:summary">
+ <div class="keypoints">
+ <h3>Summary</h3>
+ <ul>
+ <li><code>$Keyword: …$</code> in a file can be filled in with a property value each time the file is committed.</li>
+ <li>Put version numbers in programs' output to establish provenance for data.</li>
+ <li><code>svn propset svn:keywords <em>property</em> <em>files</em></code> tells Subversion to start filling in property values.</li>
+ </ul>
+ </div>
- <h2>Summing Up</h2>
+ <div class="challenges">
+ <h3>Challenges</h3>
- <p>
- Correlation does not imply causality,
- but there is a very strong correlation between
- using version control
- and doing good computational science.
- There's an equally strong correlation
- between <em>not</em> using it and wasting effort,
- so today (the middle of 2012),
- I will not review a paper if the software used in it
- is not under version control.
- Its authors' work might be interesting,
- but without the kind of record-keeping that version control provides,
- there's no way to know exactly what they did and when.
- Just as importantly,
- if someone doesn't know enough about computing to use version control,
- the odds are good that they don't know enough
- to do the programming right either.
- </p>
+ <ol>
+
+ <li>
+ Add <code>$Id:$</code> to a file,
+ use <code>svn propset</code> to set the corresponding property,
+ and then commit a change to the file.
+ What value does Subversion fill in for this keyword?
+ When would you use this rather than <code>Revision</code> or <code>Author</code>?
+ </li>
- </section>
+ <li>
+ What does the <code>svn:ignore</code> property do when applied to a directory?
+ When would you use it?
+ </li>
+
+ </ol>
+
+ </div>
+
+</section>
+
+<section id="s:summary">
+ <h2>Summing Up</h2>
+
+ <p>
+ In 2006,
+ <a href="bib.html#mccullough-reproducibility">McCullough, McGeary, and Harrison</a>
+ analyzed several years of
+ the data and code archive of <cite>Journal of Money, Credit, and Banking</cite>,
+ a prestigious journal with a mandatory archiving policy.
+ Of 266 articles published during that time,
+ 193 were empirical and should have had data and code deposited in the archive.
+ Of those,
+ only 69 actually had anything in the archive;
+ Excluding eleven articles that only had data,
+ and seven that required software or other resources they did not have,
+ McCullough et al. were only able to replicate 14 of the remaining 186 articles.
+ This doesn't mean that the other 92% were wrong,
+ but it does mean there is no practical way to tell.
+ </p>
+
+ <p>
+ By itself,
+ version control doesn't making computational research reproducible.
+ It <em>does</em> help,
+ though,
+ and also eliminates the frustration and wasted time caused by
+ trying to figure out which emailed copy of a file,
+ or which of a dozen directories or USB drives,
+ is the most recent.
+ And while correlation doesn't imply causality,
+ there is certainly a strong correlation between
+ knowing enough about good computational practices to use version control
+ and knowing how to do other things right as well.
+ </p>
+
+</section>
{% endblock content %}