1 {% extends "templates/_base.html" %}
3 {% block file_metadata %}
4 <meta name="title" content="Version Control With Subversion" />
5 {% endblock file_metadata %}
9 <li><a href="#s:basics">Basic Use</a></li>
10 <li><a href="#s:merge">Merging Conflicts</a></li>
11 <li><a href="#s:rollback">Recovering Old Versions</a></li>
12 <li><a href="#s:setup">Setting up a Repository</a></li>
13 <li><a href="#s:provenance">Provenance</a></li>
14 <li><a href="#s:summary">Summing Up</a></li>
18 Wolfman and Dracula have been hired by Universal Missions
19 (a space services spinoff from Euphoric State University)
20 to figure out where the company should send its next planetary lander.
21 They want to be able to work on the plans at the same time,
22 but they have run into problems doing this in the past.
24 each one will spend a lot of time waiting for the other to finish.
26 if they work on their own copies and email changes back and forth
27 they know that things will be lost, overwritten, or duplicated.
31 The right solution is to use a
32 <a href="glossary.html#version-control-system">version control system</a>
34 Version control is better than mailing files back and forth because:
40 It's hard (but not impossible) to accidentally overlook or overwrite someone's changes,
41 because the version control system highlights them automatically.
45 It keeps a record of who made what changes when,
46 so that if people have questions later on,
52 Nothing that is committed to version control is ever lost.
53 This means it can be used like the "undo" feature in an editor,
54 and since all old versions of files are saved
55 it's always possible to go back in time to see exactly who wrote what on a particular day,
56 or what version of a program was used to generate a particular set of results.
62 <h3>Nothing's Perfekt</h3>
65 Version control systems do have one important shortcoming.
66 While it is easy for them to find, display, and merge differences in text files,
67 images, MP3s, PDFs, or Microsoft Word or Excel files aren't stored as text—they
68 use specialized binary data formats.
69 Most version control systems don't know how to deal with these formats,
70 so all they can say is, "These files differ."
71 Reconciling those differences will probably require use of an auxiliary tool,
72 such as an audio editor
73 or Microsoft Word's "Compare and Merge" utility.
78 The rest of this chapter will explore how to use
79 a popular open source version control system called Subversion.
83 <h2>For Instructors</h2>
85 <p class="fixme">explain</p>
88 <h3>Prerequisites</h3>
89 <p class="fixme">prereq</p>
93 <h3>Teaching Notes</h3>
100 <section id="s:basics">
103 <div class="understand">
104 <h3>Learning Objectives:</h3>
106 <li>Where version control stores information.</li>
107 <li>How to check out a working copy of a repository.</li>
108 <li>How to view the history of changes to a project.</li>
109 <li>Why working copies of different projects should not overlap.</li>
110 <li>How to add files to a project.</li>
111 <li>How to submit changes made locally to a project's master copy.</li>
112 <li>How to update a working copy to get changes made to the master.</li>
113 <li>How to check the status of a working copy.</li>
118 A version control system keeps the master copy of a file
119 in a <a href="glossary.html#repository">repository</a>
120 located on a <a href="glossary.html#server">server</a>—a computer
121 that is never used directly by people,
122 but only by their programs
123 (<a href="#f:repository">Figure 1</a>).
124 No-one ever edits the master copy directly.
126 Wolfman and Dracula each have a <a href="glossary.html#working-copy">working copy</a>
127 on their own machines.
128 They can each edit their working copies whenever and however they want.
131 <figure id="f:repository">
132 <img src="svn/repository.png" alt="Repositories and Working Copies" />
133 <figcaption>Figure 1: Repositories and Working Copies</figcaption>
137 When Wolfman is ready to share his changes with Dracula,
138 he <a href="glossary.html#commit">commits</a> his work to the repository
139 (<a href="#f:workflow">Figure 2</a>).
140 Dracula can then <a href="glossary.html#update">update</a> his working copy
141 to get those changes when he's ready for them.
143 when Dracula finishes working on something,
144 he can commit and so that Wolfman can update.
147 <figure id="f:workflow">
148 <img src="svn/workflow.png" alt="Sharing Files Through Version Control" />
149 <figcaption>Figure 2: Sharing Files Through Version Control</figcaption>
153 If this is all there was to version control,
154 it would be no better than FTP or Dropbox.
155 But what if Dracula and Wolfman change their working copies at the same time?
156 If Wolfman commits first,
157 his changes are simply copied to the repository
158 (<a href="#f:merge_first_commit">Figure 3</a>):
161 <figure id="f:merge_first_commit">
162 <img src="svn/merge_first_commit.png" alt="Wolfman Commits First" />
163 <figcaption>Figure 3: Wolfman Commits First</figcaption>
167 If Dracula now tries to commit something that would overwrite Wolfman's changes
168 the version control system detects the <a href="glossary.html#conflict">conflict</a>,
170 and tells Dracula that there's a problem
171 (<a href="#f:merge_second_commit">Figure 4</a>):
174 <figure id="f:merge_second_commit">
175 <img src="svn/merge_second_commit.png" alt="Dracula Has a Conflict" />
176 <figcaption>Figure 4: Dracula Has a Conflict</figcaption>
180 Dracula must <a href="glossary.html#resolve">resolve</a> that conflict
181 before the version control system will allow him to commit his work.
182 He can accept what Wolfman did,
183 replace it with what he has done,
184 or write something new that combines the two—that's up to him
185 (<a href="#f:merge_resolve">Figure 5</a>).
186 Once he has cleaned things up, he can go ahead and try committing again.
187 If all of the conflicts have been resolved,
188 the version control will accept it this time.
191 <figure id="f:merge_resolve">
192 <img src="svn/merge_resolve.png" alt="Resolving the Conflict" />
193 <figcaption>Figure 5: Resolving the Conflict</figcaption>
197 <h3>Forgiveness vs. Permission</h3>
200 Old-fashioned version control systems prevented conflicts from happening
201 by <a href="glossary.html#lock">locking</a> the master copy
202 whenever someone was working on it.
203 This <a href="glossary.html#pessimistic-concurrency">pessimistic</a> strategy
204 guaranteed that a second person (or monster)
205 could never make changes to the same file at the same time,
206 but it also meant that people had to take turns editing files.
210 Most of today's version control systems use
211 an <a href="glossary.html#optimistic-concurrency">optimistic</a> strategy instead:
212 people are always allowed to edit their working copies,
213 and if a conflict occurs,
214 the version control system helps them sort it out after the fact.
219 To see how this actually works,
220 let's assume that the Mummy
221 (Dracula and Wolfman's boss)
222 has already put some notes in a version control repository
223 whose URL is <code>https://universal.software-carpentry.org/monsters</code>.
224 Every repository has an address like this that uniquely identifies the location of the master copy.
228 <h3>There's More Than One Way To Do It</h3>
231 We will drive Subversion from the command line in our examples,
232 but if you prefer using a GUI,
233 there are many for you to choose from.
234 Please see the <a href="ref.html#s:svn:gui">reference</a> for links.
240 and Dracula has just joined the project.
241 In order to get a working copy on his computer,
242 Dracula has to <a href="glossary.html#check-out">check out</a> a copy of the repository.
243 He only has to do this once per project:
244 once he has a working copy,
245 he can update it over and over again to get other people's work.
249 While in his home directory,
250 Dracula types the command:
254 $ <span class="in">svn checkout https://universal.software-carpentry.org/monsters</span>
258 This creates a new directory called <code>monsters</code>
259 and fills it with a copy of the repository's contents
260 (<a href="#f:example_repo">Figure 6</a>).
264 <span class="out">A monsters/jupiter
266 A monsters/mars/mons-olympus.txt
267 A monsters/mars/cydonia.txt
269 A monsters/earth/himalayas.txt
270 A monsters/earth/antarctica.txt
271 A monsters/earth/carlsbad.txt
272 Checked out revision 6.</span>
275 <figure id="f:example_repo">
276 <img src="svn/example_repo.png" alt="Example Repository" />
277 <figcaption>Figure 6: Example Repository</figcaption>
281 Dracula can then go into this directory
282 and use regular shell commands to view the files:
286 $ <span class="in">cd monsters</span>
287 $ <span class="in">ls</span>
288 <span class="out">earth jupiter mars</span>
289 $ <span class="in">ls *</span>
290 <span class="out">earth:
291 antarctica.txt carlsbad.txt himalayas.txt
296 cydonia.txt mons-olympus.txt</span>
300 <h3>Don't Let the Working Copies Overlap</h3>
303 It's very important that the working copies of different project do not overlap;
305 we should never try to check out one project inside a working copy of another project.
306 The reason is that Subversion stories information about
307 the current state of a working copy
308 in special sub-directories called <code>.svn</code>:
312 $ <span class="in">pwd</span>
313 <span class="out">/home/vlad/monsters</span>
314 $ <span class="in">ls -a</span>
315 <span class="out">. .. .svn earth jupiter mars</span>
316 $ <span class="in">ls -F .svn</span>
317 <span class="out">entries prop-base/ props/ text-base/ tmp/</span>
321 If two working copies overlap,
322 the files in the <code>.svn</code> directories for one repository
323 will be clobbered by the other repository's <code>.svn</code> files,
324 and Subversion will become hopelessly confused.
329 Dracula can find out more about the history of the project
330 using Subversion's <code>log</code> command:
334 $ <span class="in">svn log</span>
335 <span class="out">------------------------------------------------------------------------
336 r6 | mummy | 2010-07-26 09:21:10 -0400 (Mon, 26 Jul 2010) | 1 line
338 Damn the budget---the Jovian moons would be a _perfect_ place to explore.
339 ------------------------------------------------------------------------
340 r5 | mummy | 2010-07-26 09:19:39 -0400 (Mon, 26 Jul 2010) | 1 line
342 The budget might not even stretch to the Arctic :-(
343 ------------------------------------------------------------------------
344 r4 | mummy | 2010-07-26 09:17:46 -0400 (Mon, 26 Jul 2010) | 1 line
346 Budget cuts may force us to do another dry run in the Arctic.
347 ------------------------------------------------------------------------
348 r3 | mummy | 2010-07-26 09:14:14 -0400 (Mon, 26 Jul 2010) | 1 line
350 Converting document to wiki-formatted text.
351 ------------------------------------------------------------------------
352 r2 | mummy | 2010-07-26 09:11:55 -0400 (Mon, 26 Jul 2010) | 1 line
354 Or put it down near the Face of Cydonia?
355 ------------------------------------------------------------------------
356 r1 | mummy | 2010-07-26 09:08:23 -0400 (Mon, 26 Jul 2010) | 1 line
358 Send the probe to Mons Olympus?
359 ------------------------------------------------------------------------</span>
363 Subversion displays a summary of all the changes made to the project so far.
364 This list includes the
365 <a href="glossary.html#revision-number">revision number</a>,
366 the name of the person who made the change,
367 the date the change was made,
368 and whatever comment the user provided when the change was submitted.
370 the <code>monsters</code> project is currently at revision 6,
371 and all changes so far have been made by the Mummy.
375 Notice how detailed the comments on the updates are.
376 Good comments are as important in version control as they are in coding.
377 Without them, it can be very difficult to figure out who did what, when, and why.
378 We can use comments like "Changed things" and "Fixed it" if we want,
379 or even no comments at all,
380 but we'll only be making more work for our future selves.
384 <h3>Numbering Versions</h3>
387 Another thing to notice is that the revision number applies to the whole repository,
388 not to a particular file.
389 When we talk about "version 61" we mean
390 "the state of all files and directories at that point."
391 Older version control systems like CVS gave each file a new version number when it was updated,
392 which meant that version 38 of one file could correspond in time to version 17 of another
393 (<a href="#f:version_numbering">Figure 7</a>).
394 Experience shows that
395 global version numbers that apply to everything in the repository
396 are easier to manage than
397 per-file version numbers,
398 so that's what Subversion uses.
401 <figure id="f:version_numbering">
402 <img src="svn/version_numbering.png" alt="Version Numbering Schemes" />
403 <figcaption>Figure 7: Version Numbering Schemes</figcaption>
408 A couple of cubicles away,
409 Wolfman also runs <code>svn checkout</code>
410 to get a working copy of the repository.
411 He also gets version 6,
412 so the files on his machine are the same as the files on Dracula's.
413 While he is looking through the files,
414 Dracula decides to add some information to the repository about Jupiter's moons.
415 Using his favorite editor,
416 he creates a file in the <code>jupiter</code> directory called <code>moons.txt</code>,
417 and fills it with information about Io, Europa, Ganymede, and Callisto:
420 <pre src="svn/moons_initial.txt">
421 Name Orbital Radius Orbital Period Mass Radius
422 Io 421.6 1.769138 893.2 1821.6
423 Europa 670.9 3.551181 480.0 1560.8
424 Ganymede 1070.4 7.154553 1481.9 2631.2
425 Calisto 1882.7 16.689018 1075.9 2410.3
429 After double-checking his data,
430 he wants to commit the file to the repository so that everyone else on the project can see it.
431 The first step is to add the file to his working copy using <code>svn add</code>:
435 $ <span class="in">svn add jupiter/moons.txt</span>
436 <span class="out">A jupiter/moons.txt</span>
440 Adding a file is not the same as creating it—he has already done that.
442 the <code>svn add</code> command tells Subversion to add the file to
443 the list of things it's supposed to manage.
445 particularly in programming projects,
446 to have backup files or intermediate files in a directory
447 that aren't worth storing in the repository.
448 This is why version control requires us to explicitly tell it which files are to be managed.
452 Once he has told Subversion to add the file,
453 Dracula can go ahead and commit his changes to the repository.
454 He uses the <code>-m</code> flag to provide a one-line message explaining what he's doing;
456 Subversion would open his default editor
457 so that he could type in something longer.
461 $ <span class="in">svn commit -m "Some basic facts about the Galilean moons of Jupiter." jupiter/moons.txt</span>
462 <span class="out">Adding jupiter/moons.txt
463 Transmitting file data .
464 Committed revision 7.</span>
468 When Dracula runs the <code>svn commit</code> command,
469 Subversion establishes a connection to the server,
470 copies over his changes,
471 and updates the revision number from 6 to 7
472 (<a href="#f:updated_repo">Figure 8</a>).
475 <figure id="f:updated_repo">
476 <img src="svn/updated_repo.png" alt="Updated Repository" />
477 <figcaption>Figure 8: Updated Repository</figcaption>
480 <p id="a:define-head">
482 Wolfman uses <code>svn update</code> to update his working copy.
483 It tells him that a new file has been added
484 and brings his working copy up to date with version 7 of the repository,
485 because this is now the most recent revision
486 (also called the <a href="glossary.html#head">head</a>).
487 <code>svn update</code> updates an existing working copy,
488 rather than checking out a new one.
489 While <code>svn checkout</code> is usually only run once per project per machine,
490 <code>svn update</code> may be run many times a day.
494 Looking in the new file <code>jupiter/moons.txt</code>,
495 Wolfman notices that Dracula has misspelled "Callisto"
496 (it is supposed to have two L's.)
497 Wolfman edits that line of the file:
500 <pre src="svn/moons_spelling.txt">
501 Name Orbital Radius Orbital Period Mass Radius
502 Io 421.6 1.769138 893.2 1821.6
503 Europa 670.9 3.551181 480.0 1560.8
504 Ganymede 1070.4 7.154553 1481.9 2631.2
505 <span class="highlight">Callisto 1882.7 16.689018 1075.9 2410.3</span>
509 He also adds a line about Amalthea,
510 which he thinks might be an interesting place to send a probe
511 despite its small size:
514 <pre src="svn/moons_amalthea.txt">
515 Name Orbital Radius Orbital Period Mass Radius
516 <span class="highlight">Amalthea 181.4 0.498179 0.075 125.0</span>
517 Io 421.6 1.769138 893.2 1821.6
518 Europa 670.9 3.551181 480.0 1560.8
519 Ganymede 1070.4 7.154553 1481.9 2631.2
520 Callisto 1882.7 16.689018 1075.9 2410.3
525 he uses the <code>svn status</code> command to check that he hasn't accidentally changed anything else:
529 $ <span class="in">svn status</span>
530 <span class="out">M jupiter/moons.txt</span>
534 and then runs <code>svn commit</code>.
535 Since has hasn't used the <code>-m</code> flag to provide a message on the command line,
536 Subversion launches his default editor and shows him:
541 --This line, and those below, will be ignored--
547 He changes this to be
551 1. Fixed typo in moon's name: 'Calisto' -> 'Callisto'.
552 2. Added information about Amalthea.
553 --This line, and those below, will be ignored--
559 When he saves this temporary file and exits the editor,
560 Subversion commits his changes:
564 <span class="out">Sending jupiter/moons.txt
565 Transmitting file data .
566 Committed revision 8.</span>
570 Note that since Wolfman didn't specify a particular file to commit,
571 Subversion commits <em>all</em> of his changes.
572 This is why he ran the <code>svn status</code> command first.
576 <h3>Which Editor?</h3>
578 If you don't have a default editor set up,
579 Subversion will probably open an editor called Vi.
581 type escape-colon-w-q-! to exit
582 and hope it never happens again.
586 <div class="box" id="b:basics:transaction">
587 <h3>Working With Multiple Files</h3>
590 Our example only includes one file,
591 but version control can work on any number of files at once.
593 if Wolfman noticed that a dozen data files had the same incorrect header,
594 he could change it in all 12 files,
595 then commit all those changes at once.
596 This is actually the best way to work:
597 every logical change to the project should be a single commit,
598 and every commit should include everything involved in one logical change.
605 Dracula wants to synchronize with Wolfman's work.
606 Before updating his working copy with <code>svn update</code>,
608 he checks to see if he has made any changes locally
609 by running <code>svn diff</code>.
611 it compares what's in his working copy to what he got the last time he updated.
612 There are no differences,
613 so there's no output:
617 $ <span class="in">svn diff</span>
622 To compare his working copy to the master,
623 Dracula uses <code>svn diff -r HEAD</code>.
624 The <code>-r</code> flag is used to specify a revision,
625 while <code>HEAD</code> means
626 "<a href="#a:define-head">the latest version of the master</a>".
630 $ <span class="in">svn diff -r HEAD</span>
631 <span class="out">--- moons.txt(revision 8)
632 +++ moons.txt(working copy)
634 Name Orbital Radius Orbital Period Mass Radius
635 +Amalthea 181.4 0.498179 0.075 125.0
636 Io 421.6 1.769138 893.2 1821.6
637 Europa 670.9 3.551181 480.0 1560.8
638 Ganymede 1070.4 7.154553 1481.9 2631.2
639 -Calisto 1882.7 16.689018 1075.9 2410.3
640 +Callisto 1882.7 16.689018 1075.9 2410.3
645 After looking over the changes,
646 Dracula goes ahead and does the update.
650 <h3>Reading a Diff</h3>
653 The output of <code>diff</code> is cryptic even by Unix standards.
658 --- moons.txt(revision 9)
659 +++ moons.txt(working copy)
663 signal that '-' will be used to show content from revision 9
664 and '+' to show content from the user's working copy.
665 The next line, with the '@' markers,
666 indicates where lines were inserted or removed.
667 This isn't really intended for human consumption:
668 editors and other tools can use this information
669 to replay a series of edits against a file.
673 The most important parts of what follows are the lines marked with '+' and '-',
674 which show insertions and deletions respectively.
676 we can see that the line for Amalthea was inserted,
677 and that the line for Callisto was changed
678 (which is indicated by an add and a delete right next to one another).
679 Many editors and other tools can display diffs like this in a two-column display,
680 highlighting changes.
686 This is a very common workflow,
687 and is the basic heartbeat of most developers' days.
694 Update our working copy
695 so that we have any changes other people have committed.
703 Commit our changes to the repository
704 so that other people can get them.
710 It's worth noticing here how important Wolfman's comments about his changes were.
711 It's hard to see the difference between "Calisto" with one 'L' and "Callisto" with two,
712 even if the line containing the difference has been highlighted.
713 Without Wolfman's comments,
714 Dracula might have wasted time wondering what the difference was.
719 Wolfman should probably have committed his two changes separately,
720 since there's no logical connection between
721 fixing a typo in Callisto's name
722 and adding information about Amalthea to the same file.
723 Just as a function or program should do one job and one job only,
724 a single commit to version control should have a single logical purpose so that it's easier to find,
726 and if necessary undo later on.
729 <div class="keypoints">
732 <li>Version control is a better way to manage shared files than email or shared folders.</li>
733 <li>The master copy is stored in a repository.</li>
734 <li>Nobody ever edits the master directory: instead, each person edits a local working copy.</li>
735 <li>People share changes by committing them to the master or updating their local copy from the master.</li>
736 <li>The version control system prevents people from overwriting each other's work by forcing them to merge concurrent changes before committing.</li>
737 <li>It also keeps a complete history of changes made to the master so that old versions can be recovered reliably.</li>
738 <li>Version control systems work best with text files, but can also handle binary files such as images and Word documents.</li>
739 <li>Every repository is identified by a URL.</li>
740 <li>Working copies of different repositories may not overlap.</li>
741 <li>Each changed to the master copy is identified by a unique revision number.</li>
742 <li>Revisions identify snapshots of the entire repository, not changes to individual files.</li>
743 <li>Each change should be commented to make the history more readable.</li>
744 <li>Commits are transactions: either all changes are successfully committed, or none are.</li>
745 <li>The basic workflow for version control is update-change-commit.</li>
746 <li><code>svn add <em>things</em></code> tells Subversion to start managing particular files or directories.</li>
747 <li><code>svn checkout <em>url</em></code> checks out a working copy of a repository.</li>
748 <li><code>svn commit -m "<em>message</em>" <em>things</em></code> sends changes to the repository.</li>
749 <li><code>svn diff</code> compares the current state of a working copy to the state after the most recent update.</li>
750 <li><code>svn diff -r HEAD</code> compares the current state of a working copy to the state of the master copy.</li>
751 <li><code>svn history</code> shows the history of a working copy.</li>
752 <li><code>svn status</code> shows the status of a working copy.</li>
753 <li><code>svn update</code> updates a working copy from the repository.</li>
757 <div class="challenges">
775 <section id="s:merge">
777 <h2>Merging Conflicts</h2>
779 <div class="understand" id="u:merge">
782 <li>What a conflict in an update is.</li>
783 <li>How to resolve conflicts when updating.</li>
788 Dracula and Wolfman have both synchronized their working copies of <code>monsters</code>
789 with version 8 of the repository.
790 Dracula now edits his copy to change Amalthea's radius
791 from a single number to a triple to reflect its irregular shape:
794 <pre src="svn/moons_dracula_triple.txt">
795 Name Orbital Radius Orbital Period Mass Radius
796 <span class="highlight">Amalthea 181.4 0.498179 0.075 131 x 73 x 67</span>
797 Io 421.6 1.769138 893.2 1821.6
798 Europa 670.9 3.551181 480.0 1560.8
799 Ganymede 1070.4 7.154553 1481.9 2631.2
800 Callisto 1882.7 16.689018 1075.9 2410.3
804 He then commits his work,
805 creating revision 9 of the repository
806 (<a href="#f:after_dracula_commits">Figure XXX</a>).
809 <figure id="f:after_dracula_commits">
810 <img src="svn/after_dracula_commits.png" alt="After Dracula Commits" />
814 But while he is doing this,
815 Wolfman is editing <em>his</em> copy
816 to add information about two other minor moons,
820 <pre src="svn/moons_wolfman_extras.txt">
821 Name Orbital Radius Orbital Period Mass Radius
822 Amalthea 181.4 0.498179 0.075 131
823 Io 421.6 1.769138 893.2 1821.6
824 Europa 670.9 3.551181 480.0 1560.8
825 Ganymede 1070.4 7.154553 1481.9 2631.2
826 Callisto 1882.7 16.689018 1075.9 2410.3
827 <span class="highlight">Himalia 11460 250.5662 0.095 85.0
828 Elara 11740 259.6528 0.008 40.0</span>
832 When Wolfman tries to commit his changes to the repository,
833 Subversion won't let him:
837 $ <span class="in">svn commit -m "Added data for Himalia, Elara"</span>
838 <span class="out">Sending jupiter/moons.txt
839 svn: Commit failed (details follow):
840 svn: File or directory 'moons.txt' is out of date; try updating
841 svn: resource out of date; try updating</span>
846 Wolfman's changes were based on revision 8,
847 but the repository is now at revision 9,
848 and the file that Wolfman is trying to overwrite
849 is different in the later revision.
851 one of version control's main jobs is to make sure that
852 people don't trample on each other's work.)
853 Wolfman has to update his working copy to get Dracula's changes before he can commit.
855 Dracula edited a line that Wolfman didn't change,
856 so Subversion can merge the differences automatically.
860 This does <em>not</em> mean that Wolfman's changes have been committed to the repository:
861 Subversion only does that when it's ordered to.
862 Wolfman's changes are still in his working copy,
863 and <em>only</em> in his working copy.
864 But since Wolfman's version of the file now includes
865 the lines that Dracula added,
866 Wolfman can go ahead and commit them as usual to create revision 10.
870 Wolfman's working copy is now in sync with the master,
871 but Dracula's is one behind at revision 9.
873 they independently decide to add measurement units
874 to the columns in <code>moons.txt</code>.
875 Wolfman is quicker off the mark this time;
876 he adds a line to the file:
879 <pre src="svn/moons_wolfman_units.txt">
880 Name Orbital Radius Orbital Period Mass Radius
881 <span class="highlight"> (10**3 km) (days) (10**20 kg) (km)</span>
882 Amalthea 181.4 0.498179 0.075 131 x 73 x 67
883 Io 421.6 1.769138 893.2 1821.6
884 Europa 670.9 3.551181 480.0 1560.8
885 Ganymede 1070.4 7.154553 1481.9 2631.2
886 Callisto 1882.7 16.689018 1075.9 2410.3
887 Himalia 11460 250.5662 0.095 85.0
888 Elara 11740 259.6528 0.008 40.0
892 and commits it to create revision 11.
893 While he is doing this,
895 Dracula inserts a different line at the top of the file:
898 <pre src="svn/moons_dracula_units.txt">
899 Name Orbital Radius Orbital Period Mass Radius
900 <span class="highlight"> * 10^3 km * days * 10^20 kg * km</span>
901 Amalthea 181.4 0.498179 0.075 131 x 73 x 67
902 Io 421.6 1.769138 893.2 1821.6
903 Europa 670.9 3.551181 480.0 1560.8
904 Ganymede 1070.4 7.154553 1481.9 2631.2
905 Callisto 1882.7 16.689018 1075.9 2410.3
906 Himalia 11460 250.5662 0.095 85.0
907 Elara 11740 259.6528 0.008 40.0
912 when Dracula tries to commit,
913 Subversion tells him he can't.
915 when Dracula does updates his working copy,
916 he doesn't just get the line Wolfman added to create revision 11.
917 There is an actual conflict in the file,
918 so Subversion asks Dracula what he wants to do:
921 <pre src="svn/moons_dracula_conflict.txt">
922 $ <span class="in">svn update</span>
923 <span class="out">Conflict discovered in 'jupiter/moons.txt'.
924 Select: (p) postpone, (df) diff-full, (e) edit,
925 (mc) mine-conflict, (tc) theirs-conflict,
926 (s) show all options:</span>
930 Dracula choose <code>p</code> for "postpone",
931 which tells Subversion that he'll deal with the problem later.
932 Once the update is finished,
933 he opens <code>moons.txt</code> in his editor and sees:
937 Name Orbital Radius Orbital Period Mass
938 +<<<<<<< .mine
939 + * 10^3 km * days * 10^20 kg
941 + (10**3 km) (days) (10**20 kg)
942 +>>>>>>> .r11
943 Amalthea 181.4 0.498179 0.074
944 Io 421.6 1.769138 893.2
945 Europa 670.9 3.551181 480.0
946 Ganymede 1070.4 7.154553 1481.9
947 Callisto 1882.7 16.689018 1075.9
952 Subversion has inserted
953 <a href="glossary.html#conflict-marker">conflict markers</a>
954 in <code>moons.txt</code>
955 wherever there is a conflict.
956 The line <code><<<<<<< .mine</code> shows the start of the conflict,
957 and is followed by the lines from the local copy of the file.
958 The separator <code>=======</code> is then
959 followed by the lines from the repository's file that are in conflict with that section,
960 while <code>>>>>>>> .r11</code> marks the end of the conflict.
964 Before he can commit,
965 Dracula has to edit his copy of the file to get rid of those markers.
969 <pre src="svn/moons_dracula_resolved.txt">
970 Name Orbital Radius Orbital Period Mass Radius
971 <span class="highlight"> (10^3 km) (days) (10^20 kg) (km)</span>
972 Amalthea 181.4 0.498179 0.075 131 x 73 x 67
973 Io 421.6 1.769138 893.2 1821.6
974 Europa 670.9 3.551181 480.0 1560.8
975 Ganymede 1070.4 7.154553 1481.9 2631.2
976 Callisto 1882.7 16.689018 1075.9 2410.3
977 Himalia 11460 250.5662 0.095 85.0
978 Elara 11740 259.6528 0.008 40.0
982 then uses the <code>svn resolved</code> command to tell Subversion that
983 he has fixed the problem.
984 Subversion will now let him commit to create revision 12.
989 <h3>Auxiliary Files</h3>
992 When Dracula did his update and Subversion detected the conflict in <code>moons.txt</code>,
993 it created three temporary files to help Dracula resolve it.
994 The first is called <code>moons.txt.r9</code>;
995 it is the file as it was in Dracula's local copy
996 before he started making changes,
997 i.e., the common ancestor for his work
998 and whatever he is in conflict with.
1002 The second file is <code>moons.txt.r11</code>.
1003 This is the most up-to-date revision from the repository—the
1004 file as it is including Wolfman's changes.
1005 The third temporary file, <code>moons.txt.mine</code>,
1006 is the file as it was in Dracula's working copy before he did the Subversion update.
1010 Subversion creates these auxiliary files primarily
1011 to help people merge conflicts in binary files.
1012 It wouldn't make sense to insert <code><<<<<<<</code>
1013 and <code>>>>>>>></code> characters into an image file
1014 (it would almost certainly result in a corrupted image).
1015 The <code>svn resolved</code> command deletes these three extra files
1016 as well as telling Subversion that the conflict has been taken care of.
1022 Some power users prefer to work with interpolated conflict markers directly,
1023 but for the rest of us,
1024 there are several tools for displaying differences and helping to merge them,
1025 including <a href="http://diffuse.sourceforge.net/">Diffuse</a> and <a href="http://winmerge.org/">WinMerge</a>.
1026 If Dracula launches Diffuse,
1027 it displays his file,
1028 the common base that he and Wolfman were working from,
1029 and Wolfman's file in a three-pane view
1030 (<a href="#f:diff_viewer">Figure XXX</a>):
1033 <figure id="f:diff_viewer">
1034 <img src="svn/diff_viewer.png" alt="A Difference Viewer" />
1037 <p class="continue">
1038 Dracula can use the buttons to merge changes from either of the edited versions
1039 into the common ancestor,
1040 or edit the central pane directly.
1043 he uses <code>svn resolved</code> and <code>svn commit</code>
1044 to create revision 12 of the repository.
1048 In this case, the conflict was small and easy to fix.
1049 However, if two or more people on a team are repeatedly creating conflicts for one another,
1050 it's usually a signal of deeper communication problems:
1051 either they aren't talking as often as they should, or their responsibilities overlap.
1053 the version control system can help the team find and fix these issues
1054 so that it will be more productive in future.
1059 <h3>Working With Multiple Files</h3>
1062 As mentioned <a href="#a:transaction">earlier</a>,
1063 every logical change to a project should result in a single commit,
1064 and every commit should represent one logical change.
1065 This is especially true when resolving conflicts:
1066 the work done to reconcile one person's changes with another are often complicated,
1067 so it should be a single entry in the project's history,
1068 with other, later, changes coming after it.
1073 <div class="keypoints" id="k:merge">
1076 <li>Conflicts must be resolved before a commit can be completed.</li>
1077 <li>Subversion puts markers in text files to show regions of conflict.</li>
1078 <li>For each conflicted file, Subversion creates auxiliary files containing the common parent, the master version, and the local version.</li>
1079 <li><code>svn resolve <em>files</em></code> tells Subversion that conflicts have been resolved.</li>
1085 <section id="s:rollback">
1087 <h2>Recovering Old Versions</h2>
1089 <div class="understand" id="u:rollback">
1090 <h3>Understand:</h3>
1092 <li>How to undo changes to a working copy.</li>
1093 <li>How to recover old versions of files.</li>
1094 <li>What a branch is.</li>
1099 Now that we have seen how to merge files and resolve conflicts,
1100 we can look at how to use version control as an "infinite undo".
1101 Suppose that when Wolfman starts work late one night,
1102 his copy of <code>monsters</code> is in sync with the head at revision 12.
1103 He decides to edit the file <code>moons.txt</code>;
1104 unfortunately, he forgot that there was a full moon,
1105 so his changes don't make a lot of sense:
1108 <pre src="svn/poetry.txt">
1109 Just one moon can make me growl
1110 Four would make me want to howl
1115 When he's back in human form the next day,
1116 he wants to undo his changes.
1117 Without version control, his choices would be grim:
1118 he could try to edit them back into their original state by hand
1119 (which for some reason hardly ever seems to work),
1120 or ask his colleagues to send him their copies of the files
1121 (which is almost as embarrassing as chasing the neighbor's cat when in wolf form).
1125 Since he's using Subversion, though,
1126 and hasn't committed his work to the repository,
1127 all he has to do is <a href="glossary.html#revert">revert</a> his local changes.
1128 <code>svn revert</code> simply throws away local changes to files
1129 and puts things back the way they were before those changes were made.
1130 This is a purely local operation:
1131 since Subversion stores the history of the project inside every working copy,
1132 Wolfman doesn't need to be connected to the network to do this.
1137 Wolfman uses <code>svn diff</code> <em>without</em> the <code>-r HEAD</code> flag
1138 to take a look at the differences between his file
1139 and the master copy in the repository.
1140 Since he doesn't want to keep his changes,
1141 his next command is <code>svn revert moons.txt</code>.
1145 $ <span class="in">cd jupiter</span>
1146 $ <span class="in">svn revert moons.txt</span>
1147 <span class="out">Reverted moons.txt</span>
1151 What if someone <em>has</em> committed their changes,
1152 but still wants to undo them?
1154 suppose Dracula decides that the numbers in <code>moons.txt</code> would look better with commas.
1155 He edits the file to put them in:
1158 <pre src="svn/moons_commas.txt">
1159 Name Orbital Radius Orbital Period Mass Radius
1160 (10^3 km) (days) (10^20 kg) (km)
1161 Amalthea 181.4 0.498179 0.075 131 x 73 x 67
1162 Io 421.6 1.769138 893.2 1<span class="highlight">,</span>821.6
1163 Europa 670.9 3.551181 480.0 1<span class="highlight">,</span>560.8
1164 Ganymede 1<span class="highlight">,</span>070.4 7.154553 1<span class="highlight">,</span>481.9 2<span class="highlight">,</span>631.2
1165 Callisto 1<span class="highlight">,</span>882.7 16.689018 1<span class="highlight">,</span>075.9 2<span class="highlight">,</span>410.3
1166 Himalia 11<span class="highlight">,</span>460 250.5662 0.095 85.0
1167 Elara 11<span class="highlight">,</span>740 259.6528 0.008 40.0
1170 <p class="continue">
1171 then commits his changes to create revision 13.
1172 A little while later,
1173 the Mummy sees the change and orders Dracula to put things back the way they were.
1174 What should Dracula do?
1178 We can draw the sequence of events leading up to revision 13
1179 as shown in <a href="#f:before_undoing">Fixture XXX</a>:
1182 <figure id="f:before_undoing">
1183 <img src="svn/before_undoing.png" alt="Before Undoing" />
1186 <p class="continue">
1187 Dracula wants to erase revision 13 from the repository,
1188 but he can't actually do that:
1189 once a change is in the repository,
1191 What he can do instead is merge the old revision with the current revision
1192 to create a new revision
1193 (<a href="#f:merging_history">Fixture XXX</a>).
1196 <figure id="f:merging_history">
1197 <img src="svn/merging_history.png" alt="Merging History" />
1200 <p class="continue">
1201 This is exactly like merging changes made by two different people;
1202 the only difference is that the "other person" is his past self.
1207 Dracula must merge revision 12 (the one before his change)
1208 with revision 13 (the current head revision)
1209 using <code>svn merge</code>:
1213 $ <span class="in">svn merge -r HEAD:12 moons.txt</span>
1214 <span class="out">-- Reverse-merging r13 into 'moons.txt'
1218 <p class="continue">
1219 The <code>-r</code> flag specifies the range of revisions to merge:
1220 to undo the changes from revision 12 to revision 13,
1221 he uses either <code>13:12</code> or <code>HEAD:12</code>
1222 (since he is going backward in time from the most recent revision to revision 12).
1223 This is called a <a href="glossary.html#reverse-merge">reverse</a> merge
1224 because he's going backward in time.
1228 After he runs this command,
1229 he must run <code>svn commit</code> to save the changes to the repository.
1230 This creates a new revision, number 14,
1231 rather than erasing revision 13.
1233 the changes he made to create revision 13 are still there
1234 if he can ever convince the Mummy that numbers should have commas.
1238 Merging can be used to recover older revisions of files,
1239 not just the most recent,
1240 and to recover many files or directories at a time.
1241 The most frequent use, though,
1242 is to manage parallel streams of development in large projects.
1243 This is outside the scope of this chapter,
1244 but the basic idea is simple.
1248 Suppose that Universal Monsters has just released a new program for designing secret lairs.
1249 Dracula and Wolfman are supposed to start adding a few features
1250 that had to be left out of the first release because time ran short.
1252 Frankenstein and the Mummy are doing technical support:
1253 their job is to fix any bugs that users find.
1254 All sorts of things could go wrong if both teams tried to work on the same code at the same time.
1256 if Frankenstein fixed a bug and sent a new copy of the program to a user in Greenland,
1257 it would be all too easy for him to accidentally include
1258 the half-completed shark tank control feature that Wolfman was working on.
1262 The usual way to handle this situation is
1263 to create a <a href="glossary.html#branch">branch</a>
1264 in the repository for each major sub-project
1265 (<a href="#f:branch_merge">Figure XXX</a>).
1266 While Wolfman and Dracula work on
1267 the <a href="glossary.html#main-line">main line</a>,
1268 Frankenstein and the Mummy create a branch,
1269 which is just another copy of the repository's files and directories
1270 that is also under version control.
1271 They can work in their branch without disturbing Wolfman and Dracula and vice versa:
1274 <figure id="f:branch_merge">
1275 <img src="svn/branch_merge.png" alt="Branching and Merging" />
1279 Branches in version control repositories are often described as "parallel universes".
1280 Each branch starts off as a clone of the project at some moment in time
1281 (typically each time the software is released,
1282 or whenever work starts on a major new feature).
1283 Changes made to a branch only affect that branch,
1284 just as changes made to the files in one directory don't affect files in other directories.
1286 the branch and the main line are both stored in the same repository,
1287 so their revision numbers are always in step.
1291 If someone decides that a bug fix in one branch should also be made in another,
1292 all they have to do is merge the files in question.
1293 This is exactly like merging an old version of a file with the current one,
1294 but instead of going backward in time,
1295 the change is brought sideways from one branch to another.
1299 Branching helps projects scale up by letting sub-teams work independently,
1300 but too many branches can cause as many problems as they solve.
1301 Karl Fogel's excellent book
1302 <a href="bib.html#fogel-producing-oss"><cite>Producing Open Source Software</cite></a>,
1303 and Laura Wingerd and Christopher Seiwald's paper
1304 "<a href="bib.html#wingerd-seiwald-scm">High-level Best Practices in Software Configuration Management</a>",
1305 talk about branches in much more detail.
1306 Projects usually don't need to do this until they have a dozen or more developers,
1307 or until several versions of their software are in simultaneous use,
1308 but using branches is a key part of switching from software carpentry to software engineering.
1311 <div class="keypoints" id="k:rollback">
1314 <li>Old versions of files can be recovered by merging their old state with their current state.</li>
1315 <li>Recovering an old version of a file does not erase the intervening changes.</li>
1316 <li>Use branches to support parallel independent development.</li>
1317 <li><code>svn merge</code> merges two revisions of a file.</li>
1318 <li><code>svn revert</code> undoes local changes to files.</li>
1324 <section id="s:setup">
1326 <h2>Setting up a Repository</h2>
1328 <div class="understand" id="u:setup">
1329 <h3>Understand:</h3>
1331 <li>How to create a repository.</li>
1336 It is finally time to see how to create a repository.
1338 we will keep the master copy of our work in a repository
1339 on a server that we can access from other machines on the internet.
1340 That master copy consists of files and directories that no-one ever edits directly.
1341 Instead, a copy of Subversion running on that machine
1342 manages updates for us and watches for conflicts.
1343 Our working copy is a mirror image of the master sitting on our computer.
1344 When our Subversion client needs to communicate with the master,
1345 it exchanges data with the copy of Subversion running on the server.
1348 <figure id="f:repo_four_things">
1349 <img src="svn/repo_four_things.png" alt="What's Needed for a Repository" />
1353 To make this to work, we need four things
1354 (<a href="#f:repo_four_things">Figure XXX</a>):
1360 The repository itself.
1361 It's not enough to create an empty directory and start filling it with files:
1362 Subversion needs to create a lot of other structure
1363 in order to keep track of old revisions, who made what changes, and so on.
1367 The full URL of the repository.
1368 This includes the URL of the server
1369 and the path to the repository on that machine.
1370 (The second part is needed because a single server can,
1372 host many repositories.)
1376 Permission to read or write the master copy.
1377 Many open source projects give the whole world permission to read from their repository,
1378 but very few allow strangers to write to it:
1379 there are just too many possibilities for abuse.
1380 Somehow, we have to set up a password or something like it
1381 so that users can prove who they are.
1385 A working copy of the repository on our computer.
1386 Once the first three things are in place,
1387 this just means running the <code>checkout</code> command.
1393 To keep things simple,
1394 we will start by creating a repository on the machine that we're working on.
1395 This won't let us share our work with other people,
1396 but it <em>will</em> allow us to save the history of our work as we go along.
1400 The command to create a repository is <code>svnadmin create</code>,
1401 followed by the path to the repository.
1402 If we want to create a repository called <code>lair_repo</code>
1403 directly under our home directory,
1404 we just <code>cd</code> to get home
1405 and run <code>svnadmin create lair_repo</code>.
1406 This command creates a directory called <code>lair_repo</code> to hold our repository,
1407 and fills it with various files that Subversion uses
1408 to keep track of the project's history:
1412 $ <span class="in">cd</span>
1413 $ <span class="in">svnadmin create lair_repo</span>
1414 $ <span class="in">ls -F lair_repo</span>
1415 <span class="out">README.txt conf/ db/ format hooks/ locks/</span>
1418 <p class="continue">
1419 We should <em>never</em> edit anything in this repository directly.
1420 Doing so probably won't shred our sanity and leave us gibbering in mindless horror,
1421 but it will almost certainly make the repository unusable.
1425 To get a working copy of this repository,
1426 we use Subversion's <code>checkout</code> command.
1427 If our home directory is <code>/users/mummy</code>,
1428 then the full path to the repository we just created is <code>/users/mummy/lair_repo</code>,
1429 so we run <code>svn checkout file:///users/mummy/lair lair_working</code>.
1434 the second argument,
1435 <code>lair_working</code>,
1436 specifies where the working copy is to be put.
1437 The first argument is the URL of our repository,
1438 and it has two parts.
1439 <code>/users/mummy/lair_repo</code> is the path to repository directory.
1440 <code>file://</code> specifies the <a href="glossary.html#protocol">protocol</a>
1441 that Subversion will use to communicate with the repository—in this case,
1442 it says that the repository is part of the local machine's filesystem.
1443 Notice that the protocol ends in two slashes,
1444 while the absolute path to the repository starts with a slash,
1445 making three in total.
1446 A very common mistake is to type only two, since that's what web URLs normally have.
1450 When we're doing a checkout,
1451 it is <em>very</em> important that we provide the second argument,
1452 which specifies the name of the directory we want the working copy to be put in.
1454 Subversion will try to use the name of the repository,
1455 <code>lair_repo</code>,
1456 as the name of the working copy.
1457 Since we're in the directory that contains the repository,
1458 this means that Subversion will try to overwrite the repository with a working copy.
1460 there isn't much risk of our sanity being torn to shreds,
1461 but this could ruin our repository.
1465 To avoid this problem,
1466 most people create a sub-directory in their account called something like <code>repos</code>,
1467 and then create their repositories in that.
1469 we could create our repository in <code>/users/mummy/repos/lair</code>,
1470 then check out a working copy as <code>/users/mummy/lair</code>.
1471 This practice makes both names easier to read.
1475 The obvious next steps are
1476 to put our repository on a server,
1477 rather than on our personal machine,
1478 and to give other people access to the repository we have just created
1479 so that they can work with us.
1480 We'll discuss the first in <a href="web.html#s:svn">a later chapter</a>,
1482 the second really does require things that we are not going to cover in this course.
1483 If you want to do this, you can:
1489 ask your system administrator to set it up for you;
1493 use an open source hosting service like <a href="http://www.sf.net">SourceForge</a>,
1494 <a href="http://code.google.com">Google Code</a>,
1495 <a href="https://github.com/">GitHub</a>,
1496 or <a href="https://bitbucket.org/">BitBucket</a>; or
1500 spend a few dollars a month on a commercial hosting service like <a href="http://dreamhost.com">DreamHost</a>
1501 that provides web-based GUIs for creating and managing repositories.
1507 If you choose the second or third option,
1508 please check with whoever handles intellectual property at your institution
1509 to make sure that putting your work on a commercially-operated machine
1510 that is probably in some other legal jurisdiction
1511 isn't going to cause trouble.
1512 Many people assume that it's "just OK",
1513 while others act as if not having asked will be an acceptable defence later on.
1515 neither is true…
1518 <div class="keypoints" id="k:setup">
1521 <li>Repositories can be hosted locally, on local (departmental) servers, on hosting services, or on their owners' own domains.</li>
1522 <li><code>svnadmin create <em>name</em></code> creates a new repository.</li>
1528 <section id="s:provenance">
1532 <div class="understand" id="u:provenance">
1533 <h3>Understand:</h3>
1535 <li>What data provenance is.</li>
1536 <li>How to embed version numbers and other information in files managed by version control.</li>
1537 <li>How to record version information about a program in its output.</li>
1543 the <a href="glossary.html#provenance">provenance</a> of a work
1544 is the history of who owned it, when, and where.
1546 it's the record of how a particular result came to be:
1547 what raw data was processed by what version of what program to create which intermediate files,
1548 what was used to turn those files into which figures of which papers,
1553 One of the central ideas of this course is that
1554 wen can automatically track the provenance of scientific data.
1556 suppose we have a text file <code>combustion.dat</code> in a Subversion repository.
1557 Run the following two commands:
1561 $ svn propset svn:keywords Revision combustion.dat
1562 $ svn commit -m "Turning on the 'Revision' keyword" combustion.dat
1566 Now open the file in an editor
1567 and add the following line somewhere near the top:
1575 The '#' sign isn't important:
1576 it's just what <code>.dat</code> files use to show comments.
1577 The <code>$Revision:$</code> string,
1579 means something special to Subversion.
1580 Save the file, and commit the change:
1584 $ svn commit -m "Inserting the 'Revision' keyword" combustion.dat
1588 When we open the file again,
1589 we'll see that Subversion has changed that line to something like:
1596 <p class="continue">
1597 i.e., Subversion has inserted the version number
1598 after the colon and before the closing <code>$</code>.
1602 Here's what just happened.
1603 First, Subversion allows you to set
1604 <a href="glossary.html#property-subversion">properties</a>
1605 for files and and directories.
1606 These properties aren't in the files or directories themselves,
1607 but live in Subversion's database.
1608 One of those properties,
1609 <code>svn:keywords</code>,
1610 tells Subversion to look in files that are being changed
1611 for strings of the form <code>$propertyname: …$</code>,
1612 where <code>propertyname</code> is a string like <code>Revision</code> or <code>Author</code>.
1613 (About half a dozen such strings are supported.)
1617 If it sees such a string,
1618 Subversion rewrites it as the commit is taking place to replace <code>…</code>
1619 with the current version number,
1620 the name of the person making the change,
1621 or whatever else the property's name tells it to do.
1622 You only have to add the string to the file once;
1624 Subversion updates it for you every time the file changes.
1628 Putting the version number in the file this way can be pretty handy.
1629 If you copy the file to another machine,
1631 it carries its version number with it,
1632 so you can tell which version you have even if it's outside version control.
1633 We'll see some more useful things we can do with this information in
1634 <a href="python.html">the next chapter</a>.
1639 <h3>When <em>Not</em> to Use Version Control</h3>
1642 Despite the rapidly decreasing cost of storage,
1643 it is still possible to run out of disk space.
1645 people can easy go through 2 TB/month if they're not careful.
1646 Since version control tools usually store revisions in terms of lines,
1647 with binary data files,
1648 they end up essentially storing every revision separately.
1650 (it's what we'd be doing anyway),
1651 but it means version control isn't doing what it likes to do,
1652 and the repository can get very large very quickly.
1653 Another concern is that if very old data will no longer be used,
1654 it can be nice to archive or delete old data files.
1655 This is not possible if our data is version controlled:
1656 information can only be added to a repository,
1657 so it can only ever increase in size.
1663 We can use this trick with shell scripts too,
1664 or with almost any other kind of program.
1665 Going back to Nelle Nemo's data processing from the previous chapter,
1667 suppose she writes a shell script that uses <code>gooclean</code>
1668 to tidy up data files.
1669 Her first version looks like this:
1675 gooclean -b 0 100 < $filename > cleaned-$filename
1679 <p class="continue">
1680 i.e., it runs <code>gooclean</code> with bounding values of 0 and 100
1681 for each specified file,
1682 putting the result in a temporary file with a well-defined name.
1683 Assuming that '#' is the comment character for those kinds of data files,
1684 she could instead write:
1690 <span class="highlight">echo "gooclean $Revision: 901$ -b 0 100" > $filename</span>
1691 gooclean -b 0 100 < $filename <span class="highlight">>></span> cleaned-$filename
1696 The first change puts a line in the output file
1697 that describes how that file was created.
1698 The second change is to use <code>>></code> instead of <code>></code>
1699 to redirect <code>gooclean</code>'s output to the file.
1700 <code>>></code> means "append to":
1701 instead of overwriting whatever is in the file,
1702 it adds more content to it.
1703 This ensures that the first line of the file is the provenance record,
1704 with the actual output of <code>gooclean</code> after it.
1707 <div class="keypoints" id="k:provenance">
1710 <li><code>$Keyword:$</code> in a file can be filled in with a property value each time the file is committed.</li>
1711 <li idea="paranoia">Put version numbers in programs' output to establish provenance for data.</li>
1712 <li><code>svn propset svn:keywords <em>property</em> <em>files</em></code> tells Subversion to start filling in property values.</li>
1718 <section id="s:summary">
1723 Correlation does not imply causality,
1724 but there is a very strong correlation between
1725 using version control
1726 and doing good computational science.
1727 There's an equally strong correlation
1728 between <em>not</em> using it and wasting effort,
1729 so today (the middle of 2012),
1730 I will not review a paper if the software used in it
1731 is not under version control.
1732 Its authors' work might be interesting,
1733 but without the kind of record-keeping that version control provides,
1734 there's no way to know exactly what they did and when.
1735 Just as importantly,
1736 if someone doesn't know enough about computing to use version control,
1737 the odds are good that they don't know enough
1738 to do the programming right either.
1742 {% endblock content %}