1 {% extends "templates/_base.html" %}
3 {% block file_metadata %}
4 <meta name="title" content="Version Control With Subversion" />
5 {% endblock file_metadata %}
9 <li><a href="#s:basics">Basic Use</a></li>
10 <li><a href="#s:merge">Merging Conflicts</a></li>
11 <li><a href="#s:rollback">Recovering Old Versions</a></li>
12 <li><a href="#s:setup">Setting up a Repository</a></li>
13 <li><a href="#s:provenance">Provenance</a></li>
14 <li><a href="#s:summary">Summing Up</a></li>
18 Wolfman and Dracula have been hired by Universal Missions
19 (a space services spinoff from Euphoric State University)
20 to figure out where the company should send its next planetary lander.
21 They want to be able to work on the plans at the same time,
22 but they have run into problems doing this in the past.
24 each one will spend a lot of time waiting for the other to finish.
26 if they work on their own copies and email changes back and forth
27 they know that things will be lost, overwritten, or duplicated.
31 The right solution is to use a
32 <a href="glossary.html#version-control-system">version control system</a>
34 Version control is better than mailing files back and forth because:
40 It's hard (but not impossible) to accidentally overlook or overwrite someone's changes,
41 because the version control system highlights them automatically.
45 It keeps a record of who made what changes when,
46 so that if people have questions later on,
52 Nothing that is committed to version control is ever lost.
53 This means it can be used like the "undo" feature in an editor,
54 and since all old versions of files are saved
55 it's always possible to go back in time to see exactly who wrote what on a particular day,
56 or what version of a program was used to generate a particular set of results.
62 <h3>Nothing's Perfekt</h3>
65 Version control systems do have one important shortcoming.
66 While it is easy for them to find, display, and merge differences in text files,
67 images, MP3s, PDFs, or Microsoft Word or Excel files aren't stored as text—they
68 use specialized binary data formats.
69 Most version control systems don't know how to deal with these formats,
70 so all they can say is, "These files differ."
71 Reconciling those differences will probably require use of an auxiliary tool,
72 such as an audio editor
73 or Microsoft Word's "Compare and Merge" utility.
78 The rest of this chapter will explore how to use
79 a popular open source version control system called Subversion.
83 <h2>For Instructors</h2>
85 <p class="fixme">explain</p>
88 <h3>Prerequisites</h3>
89 <p class="fixme">prereq</p>
93 <h3>Teaching Notes</h3>
100 <section id="s:basics">
103 <div class="understand">
104 <h3>Learning Objectives</h3>
106 <li>Draw a diagram showing the places version control stores information.</li>
107 <li>Check out a working copy of a repository.</li>
108 <li>View the history of changes to a project.</li>
109 <li>Explain why working copies of different projects should not overlap.</li>
110 <li>Add files to a project.</li>
111 <li>Commit changes made to a working copy to a repository.</li>
112 <li>Update a working copy to get changes from the repository.</li>
113 <li>Compare the current state of a working copy to the last update from the repository, and to the current state of the repository.</li>
114 <li>Explain what "version 123 of <code>xyz.txt</code>" actually means.</li>
119 A version control system keeps the master copy of a file
120 in a <a href="glossary.html#repository">repository</a>
121 located on a <a href="glossary.html#server">server</a>—a computer
122 that is never used directly by people,
123 but only by their programs
124 (<a href="#f:repository">Figure 1</a>).
125 No-one ever edits the master copy directly.
127 Wolfman and Dracula each have a <a href="glossary.html#working-copy">working copy</a>
128 on their own machines.
129 They can each edit their working copies whenever and however they want.
132 <figure id="f:repository">
133 <img src="svn/repository.png" alt="Repositories and Working Copies" />
134 <figcaption>Figure 1: Repositories and Working Copies</figcaption>
138 When Wolfman is ready to share his changes with Dracula,
139 he <a href="glossary.html#commit">commits</a> his work to the repository
140 (<a href="#f:workflow">Figure 2</a>).
141 Dracula can then <a href="glossary.html#update">update</a> his working copy
142 to get those changes when he's ready for them.
144 when Dracula finishes working on something,
145 he can commit and so that Wolfman can update.
148 <figure id="f:workflow">
149 <img src="svn/workflow.png" alt="Sharing Files Through Version Control" />
150 <figcaption>Figure 2: Sharing Files Through Version Control</figcaption>
154 If this is all there was to version control,
155 it would be no better than FTP or Dropbox.
156 But what if Dracula and Wolfman change their working copies at the same time?
157 If Wolfman commits first,
158 his changes are simply copied to the repository
159 (<a href="#f:merge_first_commit">Figure 3</a>):
162 <figure id="f:merge_first_commit">
163 <img src="svn/merge_first_commit.png" alt="Wolfman Commits First" />
164 <figcaption>Figure 3: Wolfman Commits First</figcaption>
168 If Dracula now tries to commit something that would overwrite Wolfman's changes
169 the version control system detects the <a href="glossary.html#conflict">conflict</a>,
171 and tells Dracula that there's a problem
172 (<a href="#f:merge_second_commit">Figure 4</a>):
175 <figure id="f:merge_second_commit">
176 <img src="svn/merge_second_commit.png" alt="Dracula Has a Conflict" />
177 <figcaption>Figure 4: Dracula Has a Conflict</figcaption>
181 Dracula must <a href="glossary.html#resolve">resolve</a> that conflict
182 before the version control system will allow him to commit his work.
183 He can accept what Wolfman did,
184 replace it with what he has done,
185 or write something new that combines the two—that's up to him
186 (<a href="#f:merge_resolve">Figure 5</a>).
187 Once he has cleaned things up, he can go ahead and try committing again.
188 If all of the conflicts have been resolved,
189 the version control will accept it this time.
192 <figure id="f:merge_resolve">
193 <img src="svn/merge_resolve.png" alt="Resolving the Conflict" />
194 <figcaption>Figure 5: Resolving the Conflict</figcaption>
198 <h3>Forgiveness vs. Permission</h3>
201 Old-fashioned version control systems prevented conflicts from happening
202 by <a href="glossary.html#lock">locking</a> the master copy
203 whenever someone was working on it.
204 This <a href="glossary.html#pessimistic-concurrency">pessimistic</a> strategy
205 guaranteed that a second person (or monster)
206 could never make changes to the same file at the same time,
207 but it also meant that people had to take turns editing files.
211 Most of today's version control systems use
212 an <a href="glossary.html#optimistic-concurrency">optimistic</a> strategy instead:
213 people are always allowed to edit their working copies,
214 and if a conflict occurs,
215 the version control system helps them sort it out after the fact.
220 To see how this actually works,
221 let's assume that the Mummy
222 (Dracula and Wolfman's boss)
223 has already put some notes in a version control repository
224 whose URL is <code>https://universal.software-carpentry.org/explore</code>.
225 Every repository has an address like this that uniquely identifies the location of the master copy.
229 <h3>There's More Than One Way To Do It</h3>
232 We will drive Subversion from the command line in our examples,
233 but if you prefer using a GUI,
234 there are many for you to choose from.
235 Please see the <a href="ref.html#s:svn:gui">reference</a> for links.
241 and Dracula has just joined the project.
242 In order to get a working copy on his computer,
243 Dracula has to <a href="glossary.html#check-out">check out</a> a copy of the repository.
244 He only has to do this once per project:
245 once he has a working copy,
246 he can update it over and over again to get other people's work.
250 While in his home directory,
251 Dracula types the command:
255 $ <span class="in">svn checkout https://universal.software-carpentry.org/explore</span>
259 This creates a new directory called <code>explore</code>
260 and fills it with a copy of the repository's contents
261 (<a href="#f:example_repo">Figure 6</a>).
265 <span class="out">A explore/jupiter
267 A explore/mars/mons-olympus.txt
268 A explore/mars/cydonia.txt
270 A explore/earth/himalayas.txt
271 A explore/earth/antarctica.txt
272 A explore/earth/carlsbad.txt
273 Checked out revision 6.</span>
276 <figure id="f:example_repo">
277 <img src="svn/example_repo.png" alt="Example Repository" />
278 <figcaption>Figure 6: Example Repository</figcaption>
282 Dracula can then go into this directory
283 and use regular shell commands to view the files:
287 $ <span class="in">cd explore</span>
288 $ <span class="in">ls</span>
289 <span class="out">earth jupiter mars</span>
290 $ <span class="in">ls *</span>
291 <span class="out">earth:
292 antarctica.txt carlsbad.txt himalayas.txt
297 cydonia.txt mons-olympus.txt</span>
301 <h3>Don't Let the Working Copies Overlap</h3>
304 It's very important that the working copies of different project do not overlap;
306 we should never try to check out one project inside a working copy of another project.
307 The reason is that Subversion stories information about
308 the current state of a working copy
309 in special sub-directories called <code>.svn</code>:
313 $ <span class="in">pwd</span>
314 <span class="out">/home/dracula/explore</span>
315 $ <span class="in">ls -a</span>
316 <span class="out">. .. .svn earth jupiter mars</span>
317 $ <span class="in">ls -F .svn</span>
318 <span class="out">entries prop-base/ props/ text-base/ tmp/</span>
322 If two working copies overlap,
323 the files in the <code>.svn</code> directories for one repository
324 will be clobbered by the other repository's <code>.svn</code> files,
325 and Subversion will become hopelessly confused.
330 Dracula can find out more about the history of the project
331 using Subversion's <code>log</code> command:
335 $ <span class="in">svn log</span>
336 <span class="out">------------------------------------------------------------------------
337 r6 | mummy | 2010-07-26 09:21:10 -0400 (Mon, 26 Jul 2010) | 1 line
339 Damn the budget---the Jovian moons would be a _perfect_ place to explore.
340 ------------------------------------------------------------------------
341 r5 | mummy | 2010-07-26 09:19:39 -0400 (Mon, 26 Jul 2010) | 1 line
343 The budget might not even stretch to the Arctic :-(
344 ------------------------------------------------------------------------
345 r4 | mummy | 2010-07-26 09:17:46 -0400 (Mon, 26 Jul 2010) | 1 line
347 Budget cuts may force us to do another dry run in the Arctic.
348 ------------------------------------------------------------------------
349 r3 | mummy | 2010-07-26 09:14:14 -0400 (Mon, 26 Jul 2010) | 1 line
351 Converting document to wiki-formatted text.
352 ------------------------------------------------------------------------
353 r2 | mummy | 2010-07-26 09:11:55 -0400 (Mon, 26 Jul 2010) | 1 line
355 Or put it down near the Face of Cydonia?
356 ------------------------------------------------------------------------
357 r1 | mummy | 2010-07-26 09:08:23 -0400 (Mon, 26 Jul 2010) | 1 line
359 Send the probe to Mons Olympus?
360 ------------------------------------------------------------------------</span>
364 Subversion displays a summary of all the changes made to the project so far.
365 This list includes the
366 <a href="glossary.html#revision-number">revision number</a>,
367 the name of the person who made the change,
368 the date the change was made,
369 and whatever comment the user provided when the change was submitted.
371 the <code>explore</code> project is currently at revision 6,
372 and all changes so far have been made by the Mummy.
376 Notice how detailed the comments on the updates are.
377 Good comments are as important in version control as they are in coding.
378 Without them, it can be very difficult to figure out who did what, when, and why.
379 We can use comments like "Changed things" and "Fixed it" if we want,
380 or even no comments at all,
381 but we'll only be making more work for our future selves.
385 <h3>Numbering Versions</h3>
388 Another thing to notice is that the revision number applies to the whole repository,
389 not to a particular file.
390 When we talk about "version 61" we mean
391 "the state of all files and directories at that point."
392 Older version control systems like CVS gave each file a new version number when it was updated,
393 which meant that version 38 of one file could correspond in time to version 17 of another
394 (<a href="#f:version_numbering">Figure 7</a>).
395 Experience shows that
396 global version numbers that apply to everything in the repository
397 are easier to manage than
398 per-file version numbers,
399 so that's what Subversion uses.
402 <figure id="f:version_numbering">
403 <img src="svn/version_numbering.png" alt="Version Numbering Schemes" />
404 <figcaption>Figure 7: Version Numbering Schemes</figcaption>
409 A couple of cubicles away,
410 Wolfman also runs <code>svn checkout</code>
411 to get a working copy of the repository.
412 He also gets version 6,
413 so the files on his machine are the same as the files on Dracula's.
414 While he is looking through the files,
415 Dracula decides to add some information to the repository about Jupiter's moons.
416 Using his favorite editor,
417 he creates a file in the <code>jupiter</code> directory called <code>moons.txt</code>,
418 and fills it with information about Io, Europa, Ganymede, and Callisto:
421 <pre src="svn/moons_initial.txt">
422 Name Orbital Radius Orbital Period Mass Radius
423 Io 421.6 1.769138 893.2 1821.6
424 Europa 670.9 3.551181 480.0 1560.8
425 Ganymede 1070.4 7.154553 1481.9 2631.2
426 Calisto 1882.7 16.689018 1075.9 2410.3
430 After double-checking his data,
431 he wants to commit the file to the repository so that everyone else on the project can see it.
432 The first step is to add the file to his working copy using <code>svn add</code>:
436 $ <span class="in">svn add jupiter/moons.txt</span>
437 <span class="out">A jupiter/moons.txt</span>
441 Adding a file is not the same as creating it—he has already done that.
443 the <code>svn add</code> command tells Subversion to add the file to
444 the list of things it's supposed to manage.
446 particularly in programming projects,
447 to have backup files or intermediate files in a directory
448 that aren't worth storing in the repository.
449 This is why version control requires us to explicitly tell it which files are to be managed.
453 Once he has told Subversion to add the file,
454 Dracula can go ahead and commit his changes to the repository.
455 He uses the <code>-m</code> flag to provide a one-line message explaining what he's doing;
457 Subversion would open his default editor
458 so that he could type in something longer.
462 $ <span class="in">svn commit -m "Some basic facts about the Galilean moons of Jupiter." jupiter/moons.txt</span>
463 <span class="out">Adding jupiter/moons.txt
464 Transmitting file data .
465 Committed revision 7.</span>
469 When Dracula runs the <code>svn commit</code> command,
470 Subversion establishes a connection to the server,
471 copies over his changes,
472 and updates the revision number from 6 to 7
473 (<a href="#f:updated_repo">Figure 8</a>).
476 <figure id="f:updated_repo">
477 <img src="svn/updated_repo.png" alt="Updated Repository" />
478 <figcaption>Figure 8: Updated Repository</figcaption>
481 <p id="a:define-head">
483 Wolfman uses <code>svn update</code> to update his working copy.
484 It tells him that a new file has been added
485 and brings his working copy up to date with version 7 of the repository,
486 because this is now the most recent revision
487 (also called the <a href="glossary.html#head">head</a>).
488 <code>svn update</code> updates an existing working copy,
489 rather than checking out a new one.
490 While <code>svn checkout</code> is usually only run once per project per machine,
491 <code>svn update</code> may be run many times a day.
495 Looking in the new file <code>jupiter/moons.txt</code>,
496 Wolfman notices that Dracula has misspelled "Callisto"
497 (it is supposed to have two L's.)
498 Wolfman edits that line of the file:
501 <pre src="svn/moons_spelling.txt">
502 Name Orbital Radius Orbital Period Mass Radius
503 Io 421.6 1.769138 893.2 1821.6
504 Europa 670.9 3.551181 480.0 1560.8
505 Ganymede 1070.4 7.154553 1481.9 2631.2
506 <span class="highlight">Callisto 1882.7 16.689018 1075.9 2410.3</span>
510 He also adds a line about Amalthea,
511 which he thinks might be an interesting place to send a probe
512 despite its small size:
515 <pre src="svn/moons_amalthea.txt">
516 Name Orbital Radius Orbital Period Mass Radius
517 <span class="highlight">Amalthea 181.4 0.498179 0.075 125.0</span>
518 Io 421.6 1.769138 893.2 1821.6
519 Europa 670.9 3.551181 480.0 1560.8
520 Ganymede 1070.4 7.154553 1481.9 2631.2
521 Callisto 1882.7 16.689018 1075.9 2410.3
526 he uses the <code>svn status</code> command to check that he hasn't accidentally changed anything else:
530 $ <span class="in">svn status</span>
531 <span class="out">M jupiter/moons.txt</span>
535 and then runs <code>svn commit</code>.
536 Since has hasn't used the <code>-m</code> flag to provide a message on the command line,
537 Subversion launches his default editor and shows him:
542 --This line, and those below, will be ignored--
548 He changes this to be
552 1. Fixed typo in moon's name: 'Calisto' -> 'Callisto'.
553 2. Added information about Amalthea.
554 --This line, and those below, will be ignored--
560 When he saves this temporary file and exits the editor,
561 Subversion commits his changes:
565 <span class="out">Sending jupiter/moons.txt
566 Transmitting file data .
567 Committed revision 8.</span>
571 Note that since Wolfman didn't specify a particular file to commit,
572 Subversion commits <em>all</em> of his changes.
573 This is why he ran the <code>svn status</code> command first.
577 <h3>Which Editor?</h3>
579 If you don't have a default editor set up,
580 Subversion will probably open an editor called Vi.
582 type escape-colon-w-q-! to exit
583 and hope it never happens again.
587 <div class="box" id="b:basics:transaction">
588 <h3>Working With Multiple Files</h3>
591 Our example only includes one file,
592 but version control can work on any number of files at once.
594 if Wolfman noticed that a dozen data files had the same incorrect header,
595 he could change it in all 12 files,
596 then commit all those changes at once.
597 This is actually the best way to work:
598 every logical change to the project should be a single commit,
599 and every commit should include everything involved in one logical change.
606 Dracula wants to synchronize with Wolfman's work.
607 Before updating his working copy with <code>svn update</code>,
609 he checks to see if he has made any changes locally
610 by running <code>svn diff</code>.
612 it compares what's in his working copy to what he got the last time he updated.
613 There are no differences,
614 so there's no output:
618 $ <span class="in">svn diff</span>
623 To compare his working copy to the master,
624 Dracula uses <code>svn diff -r HEAD</code>.
625 The <code>-r</code> flag is used to specify a revision,
626 while <code>HEAD</code> means
627 "<a href="#a:define-head">the latest version of the master</a>".
631 $ <span class="in">svn diff -r HEAD</span>
632 <span class="out">--- moons.txt(revision 8)
633 +++ moons.txt(working copy)
635 Name Orbital Radius Orbital Period Mass Radius
636 +Amalthea 181.4 0.498179 0.075 125.0
637 Io 421.6 1.769138 893.2 1821.6
638 Europa 670.9 3.551181 480.0 1560.8
639 Ganymede 1070.4 7.154553 1481.9 2631.2
640 -Calisto 1882.7 16.689018 1075.9 2410.3
641 +Callisto 1882.7 16.689018 1075.9 2410.3
646 After looking over the changes,
647 Dracula goes ahead and does the update.
651 <h3>Reading a Diff</h3>
654 The output of <code>diff</code> is cryptic even by Unix standards.
659 --- moons.txt(revision 9)
660 +++ moons.txt(working copy)
664 signal that '-' will be used to show content from revision 9
665 and '+' to show content from the user's working copy.
666 The next line, with the '@' markers,
667 indicates where lines were inserted or removed.
668 This isn't really intended for human consumption:
669 editors and other tools can use this information
670 to replay a series of edits against a file.
674 The most important parts of what follows are the lines marked with '+' and '-',
675 which show insertions and deletions respectively.
677 we can see that the line for Amalthea was inserted,
678 and that the line for Callisto was changed
679 (which is indicated by an add and a delete right next to one another).
680 Many editors and other tools can display diffs like this in a two-column display,
681 highlighting changes.
687 <h3>Diffing Other Files</h3>
690 <code>svn diff</code> mimics the behavior of
691 the Unix <code>diff</code> command,
692 which can be used to compare any two files.
693 Given these two files:
698 <th><code>left.txt</code></th>
699 <th><code>right.txt</code></th>
721 <code>diff</code>'s output is:
724 $ <span class="in">diff left.txt right.txt</span>
725 <span class="out">2a3
732 > strontium</span>
737 This is a very common workflow,
738 and is the basic heartbeat of most developers' days.
745 Update our working copy
746 so that we have any changes other people have committed.
754 Commit our changes to the repository
755 so that other people can get them.
761 It's worth noticing here how important Wolfman's comments about his changes were.
762 It's hard to see the difference between "Calisto" with one 'L' and "Callisto" with two,
763 even if the line containing the difference has been highlighted.
764 Without Wolfman's comments,
765 Dracula might have wasted time wondering what the difference was.
770 Wolfman should probably have committed his two changes separately,
771 since there's no logical connection between
772 fixing a typo in Callisto's name
773 and adding information about Amalthea to the same file.
774 Just as a function or program should do one job and one job only,
775 a single commit to version control should have a single logical purpose so that it's easier to find,
777 and if necessary undo later on.
780 <div class="keypoints">
783 <li>Version control is a better way to manage shared files than email or shared folders.</li>
784 <li>The master copy is stored in a repository.</li>
785 <li>Nobody ever edits the master directory: instead, each person edits a local working copy.</li>
786 <li>People share changes by committing them to the master or updating their local copy from the master.</li>
787 <li>The version control system prevents people from overwriting each other's work by forcing them to merge concurrent changes before committing.</li>
788 <li>It also keeps a complete history of changes made to the master so that old versions can be recovered reliably.</li>
789 <li>Version control systems work best with text files, but can also handle binary files such as images and Word documents.</li>
790 <li>Every repository is identified by a URL.</li>
791 <li>Working copies of different repositories may not overlap.</li>
792 <li>Each changed to the master copy is identified by a unique revision number.</li>
793 <li>Revisions identify snapshots of the entire repository, not changes to individual files.</li>
794 <li>Each change should be commented to make the history more readable.</li>
795 <li>Commits are transactions: either all changes are successfully committed, or none are.</li>
796 <li>The basic workflow for version control is update-change-commit.</li>
797 <li><code>svn add <em>things</em></code> tells Subversion to start managing particular files or directories.</li>
798 <li><code>svn checkout <em>url</em></code> checks out a working copy of a repository.</li>
799 <li><code>svn commit -m "<em>message</em>" <em>things</em></code> sends changes to the repository.</li>
800 <li><code>svn diff</code> compares the current state of a working copy to the state after the most recent update.</li>
801 <li><code>svn diff -r HEAD</code> compares the current state of a working copy to the state of the master copy.</li>
802 <li><code>svn history</code> shows the history of a working copy.</li>
803 <li><code>svn status</code> shows the status of a working copy.</li>
804 <li><code>svn update</code> updates a working copy from the repository.</li>
808 <div class="challenges">
814 Using the repository URL, user ID, and password provided by the instructor,
815 perform the following actions:
818 Check out a working copy of the repository.
821 Create a text file called <em>your_id</em>.txt
822 (using your user ID instead of <em>your_id</em>)
823 and write a three-line biography of yourself in it.
826 Add this file to your working copy.
829 Commit your changes to the repository.
832 Update your working copy to get other people's biographies.
835 Examine the change log to see
836 the order in which people added their biographies
843 What does the command <code>svn diff -r 14</code> do?
844 What does it do if there have only been 10 changes to the repository?
849 Unix <code>diff</code> and <code>svn diff</code> compare files line by line.
850 Why doesn't this work for MP3 audio files?
858 <section id="s:merge">
859 <h2>Merging Conflicts</h2>
861 <div class="understand">
862 <h3>Learning Objectives</h3>
864 <li>Explain what causes conflicts to occur and how to tell when one has occurred.</li>
865 <li>Resolve a conflict.</li>
866 <li>Identify the auxiliary files created when a conflict occurs.</li>
871 Dracula and Wolfman have both synchronized their working copies of <code>explore</code>
872 with version 8 of the repository.
873 Dracula now edits his copy to change Amalthea's radius
874 from a single number to a triple to reflect its irregular shape:
877 <pre src="svn/moons_dracula_triple.txt">
878 Name Orbital Radius Orbital Period Mass Radius
879 <span class="highlight">Amalthea 181.4 0.498179 0.075 131 x 73 x 67</span>
880 Io 421.6 1.769138 893.2 1821.6
881 Europa 670.9 3.551181 480.0 1560.8
882 Ganymede 1070.4 7.154553 1481.9 2631.2
883 Callisto 1882.7 16.689018 1075.9 2410.3
887 He then commits his work,
888 creating revision 9 of the repository
889 (<a href="#f:after_dracula_commits">Figure 9</a>).
892 <figure id="f:after_dracula_commits">
893 <img src="svn/after_dracula_commits.png" alt="After Dracula Commits" />
894 <figcaption>Figure 9: After Dracula Commits</figcaption>
898 But while he is doing this,
899 Wolfman is editing <em>his</em> copy
900 to add information about two other minor moons,
904 <pre src="svn/moons_wolfman_extras.txt">
905 Name Orbital Radius Orbital Period Mass Radius
906 Amalthea 181.4 0.498179 0.075 131
907 Io 421.6 1.769138 893.2 1821.6
908 Europa 670.9 3.551181 480.0 1560.8
909 Ganymede 1070.4 7.154553 1481.9 2631.2
910 Callisto 1882.7 16.689018 1075.9 2410.3
911 <span class="highlight">Himalia 11460 250.5662 0.095 85.0
912 Elara 11740 259.6528 0.008 40.0</span>
916 When Wolfman tries to commit his changes to the repository,
917 Subversion won't let him:
921 $ <span class="in">svn commit -m "Added data for Himalia, Elara"</span>
922 <span class="out">Sending jupiter/moons.txt
923 svn: Commit failed (details follow):
924 svn: File or directory 'moons.txt' is out of date; try updating
925 svn: resource out of date; try updating</span>
930 Wolfman's changes were based on revision 8,
931 but the repository is now at revision 9,
932 and the file that Wolfman is trying to overwrite
933 is different in the later revision.
935 one of version control's main jobs is to make sure that
936 people don't trample on each other's work.)
937 Wolfman has to update his working copy to get Dracula's changes before he can commit.
939 Dracula edited a line that Wolfman didn't change,
940 so Subversion can merge the differences automatically.
944 This does <em>not</em> mean that Wolfman's changes have been committed to the repository:
945 Subversion only does that when it's ordered to.
946 Wolfman's changes are still in his working copy,
947 and <em>only</em> in his working copy.
948 But since Wolfman's version of the file now includes
949 the lines that Dracula added,
950 Wolfman can go ahead and commit them as usual to create revision 10.
954 Wolfman's working copy is now in sync with the master,
955 but Dracula's is one behind at revision 9.
957 they independently decide to add measurement units
958 to the columns in <code>moons.txt</code>.
959 Wolfman is quicker off the mark this time;
960 he adds a line to the file:
963 <pre src="svn/moons_wolfman_units.txt">
964 Name Orbital Radius Orbital Period Mass Radius
965 <span class="highlight"> (10**3 km) (days) (10**20 kg) (km)</span>
966 Amalthea 181.4 0.498179 0.075 131 x 73 x 67
967 Io 421.6 1.769138 893.2 1821.6
968 Europa 670.9 3.551181 480.0 1560.8
969 Ganymede 1070.4 7.154553 1481.9 2631.2
970 Callisto 1882.7 16.689018 1075.9 2410.3
971 Himalia 11460 250.5662 0.095 85.0
972 Elara 11740 259.6528 0.008 40.0
976 and commits it to create revision 11.
977 While he is doing this,
979 Dracula inserts a different line at the top of the file:
982 <pre src="svn/moons_dracula_units.txt">
983 Name Orbital Radius Orbital Period Mass Radius
984 <span class="highlight"> * 10^3 km * days * 10^20 kg * km</span>
985 Amalthea 181.4 0.498179 0.075 131 x 73 x 67
986 Io 421.6 1.769138 893.2 1821.6
987 Europa 670.9 3.551181 480.0 1560.8
988 Ganymede 1070.4 7.154553 1481.9 2631.2
989 Callisto 1882.7 16.689018 1075.9 2410.3
990 Himalia 11460 250.5662 0.095 85.0
991 Elara 11740 259.6528 0.008 40.0
996 when Dracula tries to commit,
997 Subversion tells him he can't.
999 when Dracula does updates his working copy,
1000 he doesn't just get the line Wolfman added to create revision 11.
1001 There is an actual conflict in the file,
1002 so Subversion asks Dracula what he wants to do:
1005 <pre src="svn/moons_dracula_conflict.txt">
1006 $ <span class="in">svn update</span>
1007 <span class="out">Conflict discovered in 'jupiter/moons.txt'.
1008 Select: (p) postpone, (df) diff-full, (e) edit,
1009 (mc) mine-conflict, (tc) theirs-conflict,
1010 (s) show all options:</span>
1014 Dracula choose <code>p</code> for "postpone",
1015 which tells Subversion that he'll deal with the problem later.
1016 Once the update is finished,
1017 he opens <code>moons.txt</code> in his editor and sees:
1021 Name Orbital Radius Orbital Period Mass
1022 +<<<<<<< .mine
1023 + * 10^3 km * days * 10^20 kg
1025 + (10**3 km) (days) (10**20 kg)
1026 +>>>>>>> .r11
1027 Amalthea 181.4 0.498179 0.074
1028 Io 421.6 1.769138 893.2
1029 Europa 670.9 3.551181 480.0
1030 Ganymede 1070.4 7.154553 1481.9
1031 Callisto 1882.7 16.689018 1075.9
1034 <p class="continue">
1036 Subversion has inserted
1037 <a href="glossary.html#conflict-marker">conflict markers</a>
1038 in <code>moons.txt</code>
1039 wherever there is a conflict.
1040 The line <code><<<<<<< .mine</code> shows the start of the conflict,
1041 and is followed by the lines from the local copy of the file.
1042 The separator <code>=======</code> is then
1043 followed by the lines from the repository's file that are in conflict with that section,
1044 while <code>>>>>>>> .r11</code> marks the end of the conflict.
1048 Before he can commit,
1049 Dracula has to edit his copy of the file to get rid of those markers.
1053 <pre src="svn/moons_dracula_resolved.txt">
1054 Name Orbital Radius Orbital Period Mass Radius
1055 <span class="highlight"> (10^3 km) (days) (10^20 kg) (km)</span>
1056 Amalthea 181.4 0.498179 0.075 131 x 73 x 67
1057 Io 421.6 1.769138 893.2 1821.6
1058 Europa 670.9 3.551181 480.0 1560.8
1059 Ganymede 1070.4 7.154553 1481.9 2631.2
1060 Callisto 1882.7 16.689018 1075.9 2410.3
1061 Himalia 11460 250.5662 0.095 85.0
1062 Elara 11740 259.6528 0.008 40.0
1065 <p class="continue">
1066 then uses the <code>svn resolved</code> command to tell Subversion that
1067 he has fixed the problem.
1068 Subversion will now let him commit to create revision 12.
1072 <h3>Auxiliary Files</h3>
1075 When Dracula did his update and Subversion detected the conflict in <code>moons.txt</code>,
1076 it created three temporary files to help Dracula resolve it.
1077 The first is called <code>moons.txt.r9</code>;
1078 it is the file as it was in Dracula's local copy
1079 before he started making changes,
1080 i.e., the common ancestor for his work
1081 and whatever he is in conflict with.
1085 The second file is <code>moons.txt.r11</code>.
1086 This is the most up-to-date revision from the repository—the
1087 file as it is including Wolfman's changes.
1088 The third temporary file, <code>moons.txt.mine</code>,
1089 is the file as it was in Dracula's working copy before he did the Subversion update.
1093 Subversion creates these auxiliary files primarily
1094 to help people merge conflicts in binary files.
1095 It wouldn't make sense to insert <code><<<<<<<</code>
1096 and <code>>>>>>>></code> characters into an image file
1097 (it would almost certainly result in a corrupted image).
1098 The <code>svn resolved</code> command deletes these three extra files
1099 as well as telling Subversion that the conflict has been taken care of.
1105 Some power users prefer to work with interpolated conflict markers directly,
1106 but for the rest of us,
1107 there are several tools for displaying differences and helping to merge them,
1108 including <a href="http://diffuse.sourceforge.net/">Diffuse</a> and <a href="http://winmerge.org/">WinMerge</a>.
1109 If Dracula launches Diffuse,
1110 it displays his file,
1111 the common base that he and Wolfman were working from,
1112 and Wolfman's file in a three-pane view
1113 (<a href="#f:diff_viewer">Figure 10</a>):
1116 <figure id="f:diff_viewer">
1117 <img src="svn/diff_viewer.png" alt="A Difference Viewer" />
1118 <figcaption>Figure 10: A Difference Viewer</figcaption>
1121 <p class="continue">
1122 Dracula can use the buttons to merge changes from either of the edited versions
1123 into the common ancestor,
1124 or edit the central pane directly.
1127 he uses <code>svn resolved</code> and <code>svn commit</code>
1128 to create revision 12 of the repository.
1132 In this case, the conflict was small and easy to fix.
1133 However, if two or more people on a team are repeatedly creating conflicts for one another,
1134 it's usually a signal of deeper communication problems:
1135 either they aren't talking as often as they should, or their responsibilities overlap.
1137 the version control system can help the team find and fix these issues
1138 so that it will be more productive in future.
1142 <h3>Working With Multiple Files</h3>
1145 As mentioned <a href="#a:transaction">earlier</a>,
1146 every logical change to a project should result in a single commit,
1147 and every commit should represent one logical change.
1148 This is especially true when resolving conflicts:
1149 the work done to reconcile one person's changes with another are often complicated,
1150 so it should be a single entry in the project's history,
1151 with other, later, changes coming after it.
1156 <div class="keypoints">
1159 <li>Conflicts must be resolved before a commit can be completed.</li>
1160 <li>Subversion puts markers in text files to show regions of conflict.</li>
1161 <li>For each conflicted file, Subversion creates auxiliary files containing the common parent, the master version, and the local version.</li>
1162 <li><code>svn resolve <em>files</em></code> tells Subversion that conflicts have been resolved.</li>
1166 <div class="challenges">
1170 If you are working in a group,
1171 partner with someone who has also wrote a biography for themselves
1172 for the previous section's challenges.
1177 Both partners use <code>svn update</code>
1178 to make sure their working copies are up to date
1179 and that there are no local changes.
1182 The first partner edits her biography and commits the changes.
1185 The second partner edits her copy of the file
1186 (<em>without</em> having updated to get the first partner's changes),
1187 then tries to <code>svn commit</code>.
1190 Once the second partner has resolved the conflict,
1191 she commits her changes.
1194 Repeat these four steps with roles reversed.
1199 If you are working on your own,
1200 you can simulate the steps above
1201 by checking out a second copy of the project into a new directory.
1203 this cannot overlap any existing checked-out copies.)
1204 Edit your biography in one copy and commit those changes,
1205 then switch to the other copy and edit the same file
1207 <a href="#f:challenge_conflict">Figure 11</a> shows
1208 the differences between these two challenges.
1211 <figure id="f:challenge_conflict">
1212 <img src="svn/challenge_conflict.png" alt="Practicing Conflict Resolution" />
1213 <figcaption>Figure 11: Practicing Conflict Resolution</figcaption>
1219 <section id="s:rollback">
1220 <h2>Recovering Old Versions</h2>
1222 <div class="understand">
1223 <h3>Learning Objectives</h3>
1225 <li>Discard changes made to a working copy.</li>
1226 <li>Recover an old version of a file.</li>
1227 <li>Explain what branches are and when they are used.</li>
1232 Now that we have seen how to merge files and resolve conflicts,
1233 we can look at how to use version control as an "infinite undo".
1234 Suppose that when Wolfman starts work late one night,
1235 his copy of <code>explore</code> is in sync with the head at revision 12.
1236 He decides to edit the file <code>moons.txt</code>;
1237 unfortunately, he forgot that there was a full moon,
1238 so his changes don't make a lot of sense:
1241 <pre src="svn/poetry.txt">
1242 Just one moon can make me growl
1243 Four would make me want to howl
1248 When he's back in human form the next day,
1249 he wants to undo his changes.
1250 Without version control, his choices would be grim:
1251 he could try to edit them back into their original state by hand
1252 (which for some reason hardly ever seems to work),
1253 or ask his colleagues to send him their copies of the files
1254 (which is almost as embarrassing as chasing the neighbor's cat when in wolf form).
1258 Since he's using Subversion, though,
1259 and hasn't committed his work to the repository,
1260 all he has to do is <a href="glossary.html#revert">revert</a> his local changes.
1261 <code>svn revert</code> simply throws away local changes to files
1262 and puts things back the way they were before those changes were made.
1263 This is a purely local operation:
1264 since Subversion stores the history of the project inside every working copy,
1265 Wolfman doesn't need to be connected to the network to do this.
1270 Wolfman uses <code>svn diff</code> <em>without</em> the <code>-r HEAD</code> flag
1271 to take a look at the differences between his file
1272 and the master copy in the repository.
1273 Since he doesn't want to keep his changes,
1274 his next command is <code>svn revert moons.txt</code>.
1278 $ <span class="in">cd jupiter</span>
1279 $ <span class="in">svn revert moons.txt</span>
1280 <span class="out">Reverted moons.txt</span>
1284 What if someone <em>has</em> committed their changes,
1285 but still wants to undo them?
1287 suppose Dracula decides that the numbers in <code>moons.txt</code> would look better with commas.
1288 He edits the file to put them in:
1291 <pre src="svn/moons_commas.txt">
1292 Name Orbital Radius Orbital Period Mass Radius
1293 (10^3 km) (days) (10^20 kg) (km)
1294 Amalthea 181.4 0.498179 0.075 131 x 73 x 67
1295 Io 421.6 1.769138 893.2 1<span class="highlight">,</span>821.6
1296 Europa 670.9 3.551181 480.0 1<span class="highlight">,</span>560.8
1297 Ganymede 1<span class="highlight">,</span>070.4 7.154553 1<span class="highlight">,</span>481.9 2<span class="highlight">,</span>631.2
1298 Callisto 1<span class="highlight">,</span>882.7 16.689018 1<span class="highlight">,</span>075.9 2<span class="highlight">,</span>410.3
1299 Himalia 11<span class="highlight">,</span>460 250.5662 0.095 85.0
1300 Elara 11<span class="highlight">,</span>740 259.6528 0.008 40.0
1303 <p class="continue">
1304 then commits his changes to create revision 13.
1305 A little while later,
1306 the Mummy sees the change and orders Dracula to put things back the way they were.
1307 What should Dracula do?
1311 We can draw the sequence of events leading up to revision 13
1312 as shown in <a href="#f:before_undoing">Fixture 12</a>:
1315 <figure id="f:before_undoing">
1316 <img src="svn/before_undoing.png" alt="Before Undoing" />
1317 <figcaption>Figure 12: Before Undoing</figcaption>
1320 <p class="continue">
1321 Dracula wants to erase revision 13 from the repository,
1322 but he can't actually do that:
1323 once a change is in the repository,
1325 What he can do instead is merge the old revision with the current revision
1326 to create a new revision
1327 (<a href="#f:merging_history">Fixture 13</a>).
1330 <figure id="f:merging_history">
1331 <img src="svn/merging_history.png" alt="Merging History" />
1332 <figcaption>Figure 13: Merging History</figcaption>
1335 <p class="continue">
1336 This is exactly like merging changes made by two different people;
1337 the only difference is that the "other person" is his past self.
1342 Dracula must merge revision 12 (the one before his change)
1343 with revision 13 (the current head revision)
1344 using <code>svn merge</code>:
1348 $ <span class="in">svn merge -r HEAD:12 moons.txt</span>
1349 <span class="out">-- Reverse-merging r13 into 'moons.txt'
1353 <p class="continue">
1354 The <code>-r</code> flag specifies the range of revisions to merge:
1355 to undo the changes from revision 12 to revision 13,
1356 he uses either <code>13:12</code> or <code>HEAD:12</code>
1357 (since he is going backward in time from the most recent revision to revision 12).
1358 This is called a <a href="glossary.html#reverse-merge">reverse</a> merge
1359 because he's going backward in time.
1363 After he runs this command,
1364 he must run <code>svn commit</code> to save the changes to the repository.
1365 This creates a new revision, number 14,
1366 rather than erasing revision 13.
1368 the changes he made to create revision 13 are still there
1369 if he can ever convince the Mummy that numbers should have commas.
1373 <h3>Another Way to Do It</h3>
1376 Another way to recover a particular version of a particular file
1377 is to use the <code>svn copy</code> command.
1378 If the URL of our repository is
1379 <code>https://universal.software-carpentry.org/explore</code>,
1384 $ <span class="in">svn copy https://universal.software-carpentry.org/explore/mission.txt@120 ./mission.txt</span>
1387 <p class="continue">
1388 copies the file <code>mission.txt</code> as it was in revision 120
1389 into our working directory
1390 (overwriting whatever <code>mission.txt</code> file we currently have,
1393 using <code>svn copy</code> brings along the file's history as well,
1394 so that future <code>svn log</code> operations will show
1395 how <code>mission.txt</code> was resurrected.
1400 Merging can be used to recover older revisions of files,
1401 not just the most recent,
1402 and to recover many files or directories at a time.
1403 The most frequent use, though,
1404 is to manage parallel streams of development in large projects.
1405 This is outside the scope of this chapter,
1406 but the basic idea is simple.
1410 Suppose that Universal Missions has just released a new program
1411 for designing interplanetary voyages.
1412 Dracula and Wolfman are supposed to add some features
1413 that were left out of the first release because time ran short.
1415 Frankenstein and the Mummy are doing technical support:
1416 their job is to fix any bugs that users find.
1420 All sorts of things could go wrong
1421 if both teams tried to work on the same code at the same time.
1423 Dracula and Wolfman might want to make large changes
1424 to the structure of the code
1425 in order to make it easier to add new features,
1426 while Frankenstein and the Mummy want to make as few changes as possible
1427 so as not to introduce new bugs while fixing old ones.
1431 The usual way to handle this situation is
1432 to create a <a href="glossary.html#branch">branch</a>
1433 in the repository for each major sub-project
1434 (<a href="#f:branch_merge">Figure 14</a>).
1435 While Wolfman and Dracula work on
1436 the <a href="glossary.html#main-line">main line</a>,
1437 Frankenstein and the Mummy create a branch,
1438 which is just another copy of the repository's files and directories
1439 that is also under version control.
1440 They can work in their branch without disturbing Wolfman and Dracula and vice versa:
1443 <figure id="f:branch_merge">
1444 <img src="svn/branch_merge.png" alt="Branching and Merging" />
1445 <figcaption>Figure 14: Branching and Merging</figcaption>
1449 Branches in version control repositories are often described as "parallel universes".
1450 Each branch starts off as a clone of the project at some moment in time
1451 (typically each time the software is released,
1452 or whenever work starts on a major new feature).
1453 Changes made to a branch only affect that branch,
1454 just as changes made to the files in one directory don't affect files in other directories.
1456 the branch and the main line are both stored in the same repository,
1457 so their revision numbers are always in step.
1461 If someone decides that a bug fix in one branch should also be made in another,
1462 all they have to do is merge the files in question.
1463 This is exactly like merging an old version of a file with the current one,
1464 but instead of going backward in time,
1465 the change is brought sideways from one branch to another.
1469 Branching helps projects scale up by letting sub-teams work independently,
1470 but too many branches can cause as many problems as they solve.
1471 Karl Fogel's excellent book
1472 <a href="bib.html#fogel-producing-oss"><cite>Producing Open Source Software</cite></a>,
1473 and Laura Wingerd and Christopher Seiwald's paper
1474 "<a href="bib.html#wingerd-seiwald-scm">High-level Best Practices in Software Configuration Management</a>",
1475 talk about branches in much more detail.
1476 Projects usually don't need to do this until they have a dozen or more developers,
1477 or until several versions of their software are in simultaneous use,
1478 but using branches is a key part of switching from software carpentry to software engineering.
1481 <div class="keypoints">
1484 <li>Old versions of files can be recovered by merging their old state with their current state.</li>
1485 <li>Recovering an old version of a file does not erase the intervening changes.</li>
1486 <li>Use branches to support parallel independent development.</li>
1487 <li><code>svn revert</code> undoes local changes to files.</li>
1488 <li><code>svn merge</code> merges two revisions of a file.</li>
1492 <div class="challenges">
1497 Explain what the command:
1499 svn diff -r 240:261 fish.dat
1501 does, and when you might want to run it.
1505 Suppose that a file called <code>mission.txt</code>
1506 existed in revision 90 of a repository,
1507 but had been deleted in revision 91.
1508 What two commands could we use to recover it?
1516 <section id="s:setup">
1517 <h2>Setting up a Repository</h2>
1519 <div class="understand">
1520 <h3>Learning Objectives</h3>
1522 <li>How to create a repository.</li>
1527 It is finally time to see how to create a repository.
1529 we will keep the master copy of our work in a repository
1530 on a server that we can access from other machines on the internet.
1531 That master copy consists of files and directories that no-one ever edits directly.
1532 Instead, a copy of Subversion running on that machine
1533 manages updates for us and watches for conflicts.
1534 Our working copy is a mirror image of the master sitting on our computer.
1535 When our Subversion client needs to communicate with the master,
1536 it exchanges data with the copy of Subversion running on the server.
1539 <figure id="f:repo_four_things">
1540 <img src="svn/repo_four_things.png" alt="What's Needed for a Repository" />
1541 <figcaption>Figure 15: What's Needed for a Repository</figcaption>
1545 To make this to work, we need four things
1546 (<a href="#f:repo_four_things">Figure 15</a>):
1552 The repository itself.
1553 It's not enough to create an empty directory and start filling it with files:
1554 Subversion needs to create a lot of other structure
1555 in order to keep track of old revisions, who made what changes, and so on.
1559 The full URL of the repository.
1560 This includes the URL of the server
1561 and the path to the repository on that machine.
1562 (The second part is needed because a single server can,
1564 host many repositories.)
1568 Permission to read or write the master copy.
1569 Many open source projects give the whole world permission to read from their repository,
1570 but very few allow strangers to write to it:
1571 there are just too many possibilities for abuse.
1572 Somehow, we have to set up a password or something like it
1573 so that users can prove who they are.
1577 A working copy of the repository on our computer.
1578 Once the first three things are in place,
1579 this just means running the <code>checkout</code> command.
1585 To keep things simple,
1586 we will start by creating a repository on the machine that we're working on.
1587 This won't let us share our work with other people,
1588 but it <em>will</em> allow us to save the history of our work as we go along.
1592 The command to create a repository is <code>svnadmin create</code>,
1593 followed by the path to the repository.
1594 If we want to create a repository called <code>lair_repo</code>
1595 directly under our home directory,
1596 we just <code>cd</code> to get home
1597 and run <code>svnadmin create lair_repo</code>.
1598 This command creates a directory called <code>lair_repo</code> to hold our repository,
1599 and fills it with various files that Subversion uses
1600 to keep track of the project's history:
1604 $ <span class="in">cd</span>
1605 $ <span class="in">svnadmin create lair_repo</span>
1606 $ <span class="in">ls -F lair_repo</span>
1607 <span class="out">README.txt conf/ db/ format hooks/ locks/</span>
1610 <p class="continue">
1611 We should <em>never</em> edit any of this directly,
1612 since it will almost certainly make the repository unusable.
1614 we should use <code>svn checkout</code>
1615 to get a working copy of this repository.
1616 If our home directory is <code>/users/mummy</code>,
1617 then the full path to the repository we just created is <code>/users/mummy/lair_repo</code>,
1618 so we run <code>svn checkout file:///users/mummy/lair lair_working</code>.
1623 the second argument,
1624 <code>lair_working</code>,
1625 specifies where the working copy is to be put.
1626 The first argument is the URL of our repository,
1627 and it has two parts.
1628 <code>/users/mummy/lair_repo</code> is the path to repository directory.
1629 <code>file://</code> specifies the <a href="glossary.html#protocol">protocol</a>
1630 that Subversion will use to communicate with the repository—in this case,
1631 it says that the repository is part of the local machine's filesystem.
1632 (Notice that the protocol ends in two slashes,
1633 while the absolute path to the repository starts with a slash,
1634 making three in total.
1635 A very common mistake is to type only two, since that's what web URLs normally have.)
1639 When we're doing a checkout,
1640 it is <em>very</em> important that we provide the second argument,
1641 which specifies the name of the directory we want the working copy to be put in.
1643 Subversion will try to use the name of the repository,
1644 <code>lair_repo</code>,
1645 as the name of the working copy.
1646 Since we're in the directory that contains the repository,
1647 this means that Subversion will try to overwrite the repository with a working copy.
1649 there isn't much risk of our sanity being torn to shreds,
1650 but this could ruin our repository.
1654 To avoid this problem,
1655 most people create a sub-directory in their account called something like <code>repos</code>,
1656 and then create their repositories in that.
1658 we could create our repository in <code>/users/mummy/repos/lair</code>,
1659 then check out a working copy as <code>/users/mummy/lair</code>.
1660 This practice makes both names easier to read.
1663 <p class="fixme">HERE</p>
1666 The obvious next steps are
1667 to put our repository on a server,
1668 rather than on our personal machine,
1669 and to give other people access to the repository we have just created
1670 so that they can work with us.
1671 We should <em>always</em> keep repositories on a different machine than
1672 the one we're using for day-to-day work
1673 so that if the latter is lost or damaged,
1674 we still have our master copy.
1678 The second step—sharing the repository with others—requires
1679 skills that we are deliberately not going to cover.
1680 As we discuss in the lessons on <a href="web.html">web programming</a>,
1681 as soon as you make something available over the internet,
1682 you open up a channel for attack.
1686 If you want to do this, you can:
1692 ask your system administrator to set it up for you;
1696 use an open source hosting service like <a href="http://www.sf.net">SourceForge</a>,
1697 <a href="http://code.google.com">Google Code</a>,
1698 <a href="https://github.com/">GitHub</a>,
1699 or <a href="https://bitbucket.org/">BitBucket</a>; or
1703 spend a few dollars a month on a commercial hosting service like <a href="http://dreamhost.com">DreamHost</a>
1704 that provides web-based GUIs for creating and managing repositories.
1710 If you choose the second or third option,
1711 please check with whoever handles intellectual property at your institution
1712 to make sure that putting your work on a commercially-operated machine
1713 that is probably in some other legal jurisdiction
1714 isn't going to cause trouble.
1715 Many people assume that it's "just OK",
1716 while others act as if not having asked will be an acceptable defence later on.
1718 neither is true…
1721 <div class="keypoints">
1724 <li>Repositories can be hosted locally, on local (departmental) servers, on hosting services, or on their owners' own domains.</li>
1725 <li><code>svnadmin create <em>name</em></code> creates a new repository.</li>
1729 <div class="challenges">
1731 <p class="fixme">write some</p>
1736 <section id="s:provenance">
1739 <div class="understand">
1740 <h3>Understand:</h3>
1742 <li>What data provenance is.</li>
1743 <li>How to embed version numbers and other information in files managed by version control.</li>
1744 <li>How to record version information about a program in its output.</li>
1750 the <a href="glossary.html#provenance">provenance</a> of a work
1751 is the history of who owned it, when, and where.
1753 it's the record of how a particular result came to be:
1754 what raw data was processed by what version of what program to create which intermediate files,
1755 what was used to turn those files into which figures of which papers,
1760 One of the central ideas of this course is that
1761 wen can automatically track the provenance of scientific data.
1763 suppose we have a text file <code>combustion.dat</code> in a Subversion repository.
1764 Run the following two commands:
1768 $ svn propset svn:keywords Revision combustion.dat
1769 $ svn commit -m "Turning on the 'Revision' keyword" combustion.dat
1773 Now open the file in an editor
1774 and add the following line somewhere near the top:
1782 The '#' sign isn't important:
1783 it's just what <code>.dat</code> files use to show comments.
1784 The <code>$Revision:$</code> string,
1786 means something special to Subversion.
1787 Save the file, and commit the change:
1791 $ svn commit -m "Inserting the 'Revision' keyword" combustion.dat
1795 When we open the file again,
1796 we'll see that Subversion has changed that line to something like:
1803 <p class="continue">
1804 i.e., Subversion has inserted the version number
1805 after the colon and before the closing <code>$</code>.
1809 Here's what just happened.
1810 First, Subversion allows you to set
1811 <a href="glossary.html#property-subversion">properties</a>
1812 for files and and directories.
1813 These properties aren't in the files or directories themselves,
1814 but live in Subversion's database.
1815 One of those properties,
1816 <code>svn:keywords</code>,
1817 tells Subversion to look in files that are being changed
1818 for strings of the form <code>$propertyname: …$</code>,
1819 where <code>propertyname</code> is a string like <code>Revision</code> or <code>Author</code>.
1820 (About half a dozen such strings are supported.)
1824 If it sees such a string,
1825 Subversion rewrites it as the commit is taking place to replace <code>…</code>
1826 with the current version number,
1827 the name of the person making the change,
1828 or whatever else the property's name tells it to do.
1829 You only have to add the string to the file once;
1831 Subversion updates it for you every time the file changes.
1835 Putting the version number in the file this way can be pretty handy.
1836 If you copy the file to another machine,
1838 it carries its version number with it,
1839 so you can tell which version you have even if it's outside version control.
1840 We'll see some more useful things we can do with this information in
1841 <a href="python.html">the next chapter</a>.
1845 <h3>When <em>Not</em> to Use Version Control</h3>
1848 Despite the rapidly decreasing cost of storage,
1849 it is still possible to run out of disk space.
1851 people can easy go through 2 TB/month if they're not careful.
1852 Since version control tools usually store revisions in terms of lines,
1853 with binary data files,
1854 they end up essentially storing every revision separately.
1856 (it's what we'd be doing anyway),
1857 but it means version control isn't doing what it likes to do,
1858 and the repository can get very large very quickly.
1859 Another concern is that if very old data will no longer be used,
1860 it can be nice to archive or delete old data files.
1861 This is not possible if our data is version controlled:
1862 information can only be added to a repository,
1863 so it can only ever increase in size.
1869 We can use this trick with shell scripts too,
1870 or with almost any other kind of program.
1871 Going back to Nelle Nemo's data processing from
1872 the lesson on the <a href="shell.html">shell</a>,
1874 suppose she writes a shell script that uses <code>gooclean</code>
1875 to tidy up data files.
1876 Her first version looks like this:
1882 gooclean -b 0 100 < $filename > cleaned-$filename
1886 <p class="continue">
1887 i.e., it runs <code>gooclean</code> with bounding values of 0 and 100
1888 for each specified file,
1889 putting the result in a temporary file with a well-defined name.
1890 Assuming that '#' is the comment character for those kinds of data files,
1891 she could instead write:
1897 <span class="highlight">echo "gooclean $Revision: 901$ -b 0 100" > $filename</span>
1898 gooclean -b 0 100 < $filename <span class="highlight">>></span> cleaned-$filename
1903 The first change puts a line in the output file
1904 that describes how that file was created.
1905 The second change is to use <code>>></code> instead of <code>></code>
1906 to redirect <code>gooclean</code>'s output to the file.
1907 <code>>></code> means "append to":
1908 instead of overwriting whatever is in the file,
1909 it adds more content to it.
1910 This ensures that the first line of the file is the provenance record,
1911 with the actual output of <code>gooclean</code> after it.
1914 <div class="keypoints">
1917 <li><code>$Keyword:$</code> in a file can be filled in with a property value each time the file is committed.</li>
1918 <li>Put version numbers in programs' output to establish provenance for data.</li>
1919 <li><code>svn propset svn:keywords <em>property</em> <em>files</em></code> tells Subversion to start filling in property values.</li>
1923 <div class="challenges">
1925 <p class="fixme">write some</p>
1930 <section id="s:summary">
1934 Correlation does not imply causality,
1935 but there is a very strong correlation between
1936 using version control
1937 and doing good computational science.
1938 There's an equally strong correlation
1939 between <em>not</em> using it and either wasting effort or getting things wrong.
1940 Today (the middle of 2013),
1941 I will not review a paper if the software used in it
1942 is not under version control.
1943 The work it reports might be interesting,
1944 but without the kind of record-keeping that version control provides,
1945 there's no way to know exactly what its authors did.
1946 Just as importantly,
1947 if someone doesn't know enough about computing to use version control,
1948 the odds are good that they don't know enough
1949 to do the programming right either.
1953 {% endblock content %}