1 {% extends "templates/_base.html" %}
3 {% block file_metadata %}
4 <meta name="title" content="Version Control With Subversion" />
5 <meta name="status" content="Ready for comment" />
6 {% endblock file_metadata %}
10 <li><a href="#s:basics">Basic Use</a></li>
11 <li><a href="#s:merge">Merging Conflicts</a></li>
12 <li><a href="#s:rollback">Recovering Old Versions</a></li>
13 <li><a href="#s:setup">Setting up a Repository</a></li>
14 <li><a href="#s:provenance">Provenance</a></li>
15 <li><a href="#s:summary">Summing Up</a></li>
19 Wolfman and Dracula have been hired by Universal Missions
20 (a space services spinoff from Euphoric State University)
21 to figure out where the company should send its next planetary lander.
22 They want to be able to work on the plans at the same time,
23 but they have run into problems doing this in the past.
25 each one will spend a lot of time waiting for the other to finish.
27 if they work on their own copies and email changes back and forth
28 they know that things will be lost, overwritten, or duplicated.
32 The right solution is to use a
33 <a href="glossary.html#version-control-system">version control system</a>
35 Version control is better than mailing files back and forth because:
41 It's hard (but not impossible) to accidentally overlook or overwrite someone's changes,
42 because the version control system highlights them automatically.
46 It keeps a record of who made what changes when,
47 so that if people have questions later on,
53 Nothing that is committed to version control is ever lost.
54 This means it can be used like the "undo" feature in an editor,
55 and since all old versions of files are saved
56 it's always possible to go back in time to see exactly who wrote what on a particular day,
57 or what version of a program was used to generate a particular set of results.
63 The rest of this chapter will explore how to use
64 a popular open source version control system called Subversion.
65 It does not have all the features of some newer systems,
66 such as <a href="git.html">Git</a>,
67 but it is still widely used,
68 and is simpler to pick up than those more advanced alternatives.
69 No matter which system you use,
70 the most important thing to learn is not the details of their more obscure commands,
71 but the workflow that they encourage.
75 <h2>For Instructors</h2>
78 Version control is the most important practical skill we introduce.
79 As the last paragraph of the introduction above says,
80 the workflow matters more than the ins and outs of any particular tool.
81 By the end of 90 minutes,
82 the instructor should be able to get learners to chant,
83 "Update, edit, merge, commit," in unison,
84 and have them understand what those terms mean
85 and why that's a good way to structure their working day.
89 Provided there aren't network problems,
90 this entire lesson can be covered in <span class="duration">90 minutes</span>.
91 The example at the end
92 showing how to use Subversion keywords to track provenance
93 is the "ah ha!" moment for many learners.
95 skip the material on recovering old versions of files
96 in order to get to this section instead.
97 (The fact that provenance is harder in Git,
98 both mechanically and conceptually,
99 is one reason to keep teaching Subversion.)
103 <h3>Prerequisites</h3>
105 Basic shell concepts and skills
106 (<code>ls</code>, <code>cd</code>, <code>mkdir</code>,
108 basic shell scripting
109 (for the discussion of <a href="#s:provenance">provenance</a>).
114 <h3>Teaching Notes</h3>
117 Make sure the network is working <em>before</em> starting this lesson.
120 Give learners a ten-minute overview of what version control does for them
121 before diving into the watch-and-do practicals.
122 Most of them will have tried to co-author papers by emailing files back and forth,
123 or will have biked into the office
124 only to realize that the USB key with last night's work
125 is still on the kitchen table.
126 Instructors can also make jokes about directories with names like
128 "final version revised",
129 "final version with reviewer three's corrections",
130 "really final version",
132 "come on this really has to be the last version"
133 to motivate version control as a better way to collaborate
134 and as a better way to back work up.
137 Version control is typically taught after the shell,
138 so collect learners' names during that session
139 and create a repository for them to share
140 with their names as both their IDs and their passwords.
141 The easiest way to create the repository is to use
142 a server managed by an ISP such as Dreamhost,
143 or on SourceForge, Google Code, or some other "forge" site,
144 all of which provide web interfaces for repository creation and management.
145 If your learners are advanced enough to be using SSH,
146 you can instead create it on any server they can access,
147 and connect with the <code>svn+ssh</code> protocol instead of HTTPS.
150 Be very clear what files learners are to edit
151 and what user IDs they are to use
152 when giving instructions.
153 It is common for them to edit the instructor's biography,
154 or to use the instructor's user ID and password when committing.
155 Be equally clear <em>when</em> they are to edit things:
156 it's also common for someone to edit the file the instructor is editing
157 and commit changes while the instructor is explaining what's going on,
158 so that a conflict occurs when the instructor comes to commit the file.
161 Learners could do most exercises with repositories on their own machines,
162 but it's hard for them to see how version control helps collaboration
163 unless they're sharing a repository with other learners.
165 showing learners who changed what using <code>svn blame</code>
166 is only compelling if a file has been edited by at least two people.
169 If some learners are using Windows,
170 there will inevitably be issues merging files with different line endings.
171 <code>svn diff -x -w</code> is supposed to suppress differences in whitespace,
172 but we have found that it doesn't always work as advertised.
179 <section id="s:basics">
182 <div class="understand">
183 <h3>Learning Objectives</h3>
185 <li>Draw a diagram showing the places version control stores information.</li>
186 <li>Check out a working copy of a repository.</li>
187 <li>View the history of changes to a project.</li>
188 <li>Explain why working copies of different projects should not overlap.</li>
189 <li>Add files to a project.</li>
190 <li>Commit changes made to a working copy to a repository.</li>
191 <li>Update a working copy to get changes from the repository.</li>
192 <li>Compare the current state of a working copy to the last update from the repository, and to the current state of the repository.</li>
193 <li>Explain what "version 123 of <code>xyz.txt</code>" actually means.</li>
196 <span class="duration">20 minutes</span>.
201 A version control system keeps the master copy of a file
202 in a <a href="glossary.html#repository">repository</a>
203 located on a <a href="glossary.html#server">server</a>—a computer
204 that is never used directly by people,
205 but only by their programs
206 (<a href="#f:repository">Figure 1</a>).
207 No-one ever edits the master copy directly.
209 Wolfman and Dracula each have a <a href="glossary.html#working-copy">working copy</a>
210 on their own machines.
211 They can each edit their working copies whenever and however they want.
214 <figure id="f:repository">
215 <img src="svn/repository.png" alt="Repositories and Working Copies" />
216 <figcaption>Figure 1: Repositories and Working Copies</figcaption>
220 When Wolfman is ready to share his changes with Dracula,
221 he <a href="glossary.html#commit">commits</a> his work to the repository
222 (<a href="#f:workflow">Figure 2</a>).
223 Dracula can then <a href="glossary.html#update">update</a> his working copy
224 to get those changes when he's ready for them.
226 when Dracula finishes working on something,
227 he can commit and so that Wolfman can update.
230 <figure id="f:workflow">
231 <img src="svn/workflow.png" alt="Sharing Files Through Version Control" />
232 <figcaption>Figure 2: Sharing Files Through Version Control</figcaption>
236 If this is all there was to version control,
237 it would be no better than FTP or Dropbox.
238 But what if Dracula and Wolfman change their working copies at the same time?
239 If Wolfman commits first,
240 his changes are simply copied to the repository
241 (<a href="#f:merge_first_commit">Figure 3</a>):
244 <figure id="f:merge_first_commit">
245 <img src="svn/merge_first_commit.png" alt="Wolfman Commits First" />
246 <figcaption>Figure 3: Wolfman Commits First</figcaption>
250 If Dracula now tries to commit something that would overwrite Wolfman's changes
251 the version control system detects the <a href="glossary.html#conflict">conflict</a>,
253 and tells Dracula that there's a problem
254 (<a href="#f:merge_second_commit">Figure 4</a>):
257 <figure id="f:merge_second_commit">
258 <img src="svn/merge_second_commit.png" alt="Dracula Has a Conflict" />
259 <figcaption>Figure 4: Dracula Has a Conflict</figcaption>
263 Dracula must <a href="glossary.html#resolve">resolve</a> that conflict
264 before the version control system will allow him to commit his work.
265 He can accept what Wolfman did,
266 replace it with what he has done,
267 or write something new that combines the two—that's up to him
268 (<a href="#f:merge_resolve">Figure 5</a>).
269 Once he has cleaned things up, he can go ahead and try committing again.
270 If all of the conflicts have been resolved,
271 the version control will accept it this time.
274 <figure id="f:merge_resolve">
275 <img src="svn/merge_resolve.png" alt="Resolving the Conflict" />
276 <figcaption>Figure 5: Resolving the Conflict</figcaption>
280 <h3>Forgiveness vs. Permission</h3>
283 Old-fashioned version control systems prevented conflicts from happening
284 by <a href="glossary.html#lock">locking</a> the master copy
285 whenever someone was working on it.
286 This <a href="glossary.html#pessimistic-concurrency">pessimistic</a> strategy
287 guaranteed that a second person (or monster)
288 could never make changes to the same file at the same time,
289 but it also meant that people had to take turns editing files.
293 Most of today's version control systems use
294 an <a href="glossary.html#optimistic-concurrency">optimistic</a> strategy instead:
295 people are always allowed to edit their working copies,
296 and if a conflict occurs,
297 the version control system helps them sort it out after the fact.
302 To see how this actually works,
303 let's assume that the Mummy
304 (Dracula and Wolfman's boss)
305 has already put some notes in a version control repository
306 whose URL is <code>https://universal.software-carpentry.org/explore</code>.
307 Every repository has an address like this that uniquely identifies the location of the master copy.
311 <h3>There's More Than One Way To Do It</h3>
314 We will drive Subversion from the command line in our examples,
315 but if you prefer using a GUI,
316 there are many for you to choose from.
317 Please see the <a href="ref.html#s:svn:gui">reference</a> for links.
323 and Dracula has just joined the project.
324 In order to get a working copy on his computer,
325 Dracula has to <a href="glossary.html#check-out">check out</a> a copy of the repository.
326 He only has to do this once per project:
327 once he has a working copy,
328 he can update it over and over again to get other people's work.
332 While in his home directory,
333 Dracula types the command:
337 $ <span class="in">svn checkout https://universal.software-carpentry.org/explore</span>
341 This creates a new directory called <code>explore</code>
342 and fills it with a copy of the repository's contents
343 (<a href="#f:example_repo">Figure 6</a>).
347 <span class="out">A explore/jupiter
349 A explore/mars/mons-olympus.txt
350 A explore/mars/cydonia.txt
352 A explore/earth/himalayas.txt
353 A explore/earth/antarctica.txt
354 A explore/earth/carlsbad.txt
355 Checked out revision 6.</span>
358 <figure id="f:example_repo">
359 <img src="svn/example_repo.png" alt="Example Repository" />
360 <figcaption>Figure 6: Example Repository</figcaption>
364 Dracula can then go into this directory
365 and use regular shell commands to view the files:
369 $ <span class="in">cd explore</span>
370 $ <span class="in">ls</span>
371 <span class="out">earth jupiter mars</span>
372 $ <span class="in">ls *</span>
373 <span class="out">earth:
374 antarctica.txt carlsbad.txt himalayas.txt
379 cydonia.txt mons-olympus.txt</span>
383 <h3>Don't Let the Working Copies Overlap</h3>
386 It's very important that the working copies of different project do not overlap;
388 we should never try to check out one project inside a working copy of another project.
389 The reason is that Subversion stories information about
390 the current state of a working copy
391 in special sub-directories called <code>.svn</code>:
395 $ <span class="in">pwd</span>
396 <span class="out">/home/dracula/explore</span>
397 $ <span class="in">ls -a</span>
398 <span class="out">. .. .svn earth jupiter mars</span>
399 $ <span class="in">ls -F .svn</span>
400 <span class="out">entries prop-base/ props/ text-base/ tmp/</span>
404 If two working copies overlap,
405 the files in the <code>.svn</code> directories for one repository
406 will be clobbered by the other repository's <code>.svn</code> files,
407 and Subversion will become hopelessly confused.
412 Dracula can find out more about the history of the project
413 using Subversion's <code>log</code> command:
417 $ <span class="in">svn log</span>
418 <span class="out">------------------------------------------------------------------------
419 r6 | mummy | 2010-07-26 09:21:10 -0400 (Mon, 26 Jul 2010) | 1 line
421 Damn the budget---the Jovian moons would be a _perfect_ place to explore.
422 ------------------------------------------------------------------------
423 r5 | mummy | 2010-07-26 09:19:39 -0400 (Mon, 26 Jul 2010) | 1 line
425 The budget might not even stretch to the Arctic :-(
426 ------------------------------------------------------------------------
427 r4 | mummy | 2010-07-26 09:17:46 -0400 (Mon, 26 Jul 2010) | 1 line
429 Budget cuts may force us to do another dry run in the Arctic.
430 ------------------------------------------------------------------------
431 r3 | mummy | 2010-07-26 09:14:14 -0400 (Mon, 26 Jul 2010) | 1 line
433 Converting document to wiki-formatted text.
434 ------------------------------------------------------------------------
435 r2 | mummy | 2010-07-26 09:11:55 -0400 (Mon, 26 Jul 2010) | 1 line
437 Or put it down near the Face of Cydonia?
438 ------------------------------------------------------------------------
439 r1 | mummy | 2010-07-26 09:08:23 -0400 (Mon, 26 Jul 2010) | 1 line
441 Send the probe to Mons Olympus?
442 ------------------------------------------------------------------------</span>
446 Subversion displays a summary of all the changes made to the project so far.
447 This list includes the
448 <a href="glossary.html#revision-number">revision number</a>,
449 the name of the person who made the change,
450 the date the change was made,
451 and whatever comment the user provided when the change was submitted.
453 the <code>explore</code> project is currently at revision 6,
454 and all changes so far have been made by the Mummy.
458 Notice how detailed the comments on the updates are.
459 Good comments are as important in version control as they are in coding.
460 Without them, it can be very difficult to figure out who did what, when, and why.
461 We can use comments like "Changed things" and "Fixed it" if we want,
462 or even no comments at all,
463 but we'll only be making more work for our future selves.
467 <h3>Numbering Versions</h3>
470 Another thing to notice is that the revision number applies to the whole repository,
471 not to a particular file.
472 When we talk about "version 61" we mean
473 "the state of all files and directories at that point."
474 Older version control systems like CVS gave each file a new version number when it was updated,
475 which meant that version 38 of one file could correspond in time to version 17 of another
476 (<a href="#f:version_numbering">Figure 7</a>).
477 Experience shows that
478 global version numbers that apply to everything in the repository
479 are easier to manage than
480 per-file version numbers,
481 so that's what Subversion uses.
484 <figure id="f:version_numbering">
485 <img src="svn/version_numbering.png" alt="Version Numbering Schemes" />
486 <figcaption>Figure 7: Version Numbering Schemes</figcaption>
491 A couple of cubicles away,
492 Wolfman also runs <code>svn checkout</code>
493 to get a working copy of the repository.
494 He also gets version 6,
495 so the files on his machine are the same as the files on Dracula's.
496 While he is looking through the files,
497 Dracula decides to add some information to the repository about Jupiter's moons.
498 Using his favorite editor,
499 he creates a file in the <code>jupiter</code> directory called <code>moons.txt</code>,
500 and fills it with information about Io, Europa, Ganymede, and Callisto:
503 <pre src="svn/moons_initial.txt">
504 Name Orbital Radius Orbital Period Mass Radius
505 Io 421.6 1.769138 893.2 1821.6
506 Europa 670.9 3.551181 480.0 1560.8
507 Ganymede 1070.4 7.154553 1481.9 2631.2
508 Calisto 1882.7 16.689018 1075.9 2410.3
512 After double-checking his data,
513 he wants to commit the file to the repository so that everyone else on the project can see it.
514 The first step is to add the file to his working copy using <code>svn add</code>:
518 $ <span class="in">svn add jupiter/moons.txt</span>
519 <span class="out">A jupiter/moons.txt</span>
523 Adding a file is not the same as creating it—he has already done that.
525 the <code>svn add</code> command tells Subversion to add the file to
526 the list of things it's supposed to manage.
528 particularly in programming projects,
529 to have backup files or intermediate files in a directory
530 that aren't worth storing in the repository.
531 This is why version control requires us to explicitly tell it which files are to be managed.
535 Once he has told Subversion to add the file,
536 Dracula can go ahead and commit his changes to the repository.
537 He uses the <code>-m</code> flag to provide a one-line message explaining what he's doing;
539 Subversion would open his default editor
540 so that he could type in something longer.
544 $ <span class="in">svn commit -m "Some basic facts about the Galilean moons of Jupiter." jupiter/moons.txt</span>
545 <span class="out">Adding jupiter/moons.txt
546 Transmitting file data .
547 Committed revision 7.</span>
551 When Dracula runs the <code>svn commit</code> command,
552 Subversion establishes a connection to the server,
553 copies over his changes,
554 and updates the revision number from 6 to 7
555 (<a href="#f:updated_repo">Figure 8</a>).
558 <figure id="f:updated_repo">
559 <img src="svn/updated_repo.png" alt="Updated Repository" />
560 <figcaption>Figure 8: Updated Repository</figcaption>
564 <h3>When <em>Not</em> to Use Version Control</h3>
567 Despite the rapidly decreasing cost of storage,
568 it is still possible to run out of disk space.
570 people can easy go through 2 TB/month if they're not careful.
571 Since version control tools usually store revisions in terms of lines,
572 with binary data files,
573 they end up essentially storing every revision separately.
575 (it's what we'd be doing anyway),
576 but it means version control isn't doing what it likes to do,
577 and the repository can get very large very quickly.
578 Another concern is that if very old data will no longer be used,
579 it can be nice to archive or delete old data files.
580 This is not possible if our data is version controlled:
581 information can only be added to a repository,
582 so it can only ever increase in size.
587 <p id="a:define-head">
589 Wolfman uses <code>svn update</code> to update his working copy.
590 It tells him that a new file has been added
591 and brings his working copy up to date with version 7 of the repository,
592 because this is now the most recent revision
593 (also called the <a href="glossary.html#head">head</a>).
594 <code>svn update</code> updates an existing working copy,
595 rather than checking out a new one.
596 While <code>svn checkout</code> is usually only run once per project per machine,
597 <code>svn update</code> may be run many times a day.
601 Looking in the new file <code>jupiter/moons.txt</code>,
602 Wolfman notices that Dracula has misspelled "Callisto"
603 (it is supposed to have two L's.)
604 Wolfman edits that line of the file:
607 <pre src="svn/moons_spelling.txt">
608 Name Orbital Radius Orbital Period Mass Radius
609 Io 421.6 1.769138 893.2 1821.6
610 Europa 670.9 3.551181 480.0 1560.8
611 Ganymede 1070.4 7.154553 1481.9 2631.2
612 <span class="highlight">Callisto 1882.7 16.689018 1075.9 2410.3</span>
616 He also adds a line about Amalthea,
617 which he thinks might be an interesting place to send a probe
618 despite its small size:
621 <pre src="svn/moons_amalthea.txt">
622 Name Orbital Radius Orbital Period Mass Radius
623 <span class="highlight">Amalthea 181.4 0.498179 0.075 125.0</span>
624 Io 421.6 1.769138 893.2 1821.6
625 Europa 670.9 3.551181 480.0 1560.8
626 Ganymede 1070.4 7.154553 1481.9 2631.2
627 Callisto 1882.7 16.689018 1075.9 2410.3
632 he uses the <code>svn status</code> command to check that he hasn't accidentally changed anything else:
636 $ <span class="in">svn status</span>
637 <span class="out">M jupiter/moons.txt</span>
641 and then runs <code>svn commit</code>.
642 Since has hasn't used the <code>-m</code> flag to provide a message on the command line,
643 Subversion launches his default editor and shows him:
648 --This line, and those below, will be ignored--
654 He changes this to be
658 1. Fixed typo in moon's name: 'Calisto' -> 'Callisto'.
659 2. Added information about Amalthea.
660 --This line, and those below, will be ignored--
666 When he saves this temporary file and exits the editor,
667 Subversion commits his changes:
671 <span class="out">Sending jupiter/moons.txt
672 Transmitting file data .
673 Committed revision 8.</span>
677 Note that since Wolfman didn't specify a particular file to commit,
678 Subversion commits <em>all</em> of his changes.
679 This is why he ran the <code>svn status</code> command first.
683 <h3>Which Editor?</h3>
685 If you don't have a default editor set up,
686 Subversion will probably open an editor called Vi.
688 type escape-colon-w-q-! to exit
689 and hope it never happens again.
693 <div class="box" id="b:basics:transaction">
694 <h3>Working With Multiple Files</h3>
697 Our example only includes one file,
698 but version control can work on any number of files at once.
700 if Wolfman noticed that a dozen data files had the same incorrect header,
701 he could change it in all 12 files,
702 then commit all those changes at once.
703 This is actually the best way to work:
704 every logical change to the project should be a single commit,
705 and every commit should include everything involved in one logical change.
712 Dracula wants to synchronize with Wolfman's work.
713 Before updating his working copy with <code>svn update</code>,
715 he checks to see if he has made any changes locally
716 by running <code>svn diff</code>.
718 it compares what's in his working copy to what he got the last time he updated.
719 There are no differences,
720 so there's no output:
724 $ <span class="in">svn diff</span>
729 To compare his working copy to the master,
730 Dracula uses <code>svn diff -r HEAD</code>.
731 The <code>-r</code> flag is used to specify a revision,
732 while <code>HEAD</code> means
733 "<a href="#a:define-head">the latest version of the master</a>".
737 $ <span class="in">svn diff -r HEAD</span>
738 <span class="out">--- moons.txt(revision 8)
739 +++ moons.txt(working copy)
741 Name Orbital Radius Orbital Period Mass Radius
742 +Amalthea 181.4 0.498179 0.075 125.0
743 Io 421.6 1.769138 893.2 1821.6
744 Europa 670.9 3.551181 480.0 1560.8
745 Ganymede 1070.4 7.154553 1481.9 2631.2
746 -Calisto 1882.7 16.689018 1075.9 2410.3
747 +Callisto 1882.7 16.689018 1075.9 2410.3
752 After looking over the changes,
753 Dracula goes ahead and does the update.
757 <h3>Reading a Diff</h3>
760 The output of <code>diff</code> is cryptic even by Unix standards.
765 --- moons.txt(revision 9)
766 +++ moons.txt(working copy)
770 signal that '-' will be used to show content from revision 9
771 and '+' to show content from the user's working copy.
772 The next line, with the '@' markers,
773 indicates where lines were inserted or removed.
774 This isn't really intended for human consumption:
775 editors and other tools can use this information
776 to replay a series of edits against a file.
780 The most important parts of what follows are the lines marked with '+' and '-',
781 which show insertions and deletions respectively.
783 we can see that the line for Amalthea was inserted,
784 and that the line for Callisto was changed
785 (which is indicated by an add and a delete right next to one another).
786 Many editors and other tools can display diffs like this in a two-column display,
787 highlighting changes.
793 <h3>Nothing's Perfekt</h3>
796 Version control systems do have one important shortcoming.
797 While it is easy for them to find, display, and merge differences in text files,
798 images, MP3s, PDFs, or Microsoft Word or Excel files aren't stored as text—they
799 use specialized binary data formats.
800 Most version control systems don't know how to deal with these formats,
801 so all they can say is, "These files differ."
802 Reconciling those differences will probably require use of an auxiliary tool,
803 such as an audio editor
804 or Microsoft Word's "Compare and Merge" utility.
809 <h3>Diffing Other Files</h3>
812 <code>svn diff</code> mimics the behavior of
813 the Unix <code>diff</code> command,
814 which can be used to compare any two files.
815 Given these two files:
820 <th><code>left.txt</code></th>
821 <th><code>right.txt</code></th>
843 <code>diff</code>'s output is:
846 $ <span class="in">diff left.txt right.txt</span>
847 <span class="out">2a3
854 > strontium</span>
859 This is a very common workflow,
860 and is the basic heartbeat of most developers' days.
867 Update our working copy
868 so that we have any changes other people have committed.
876 Commit our changes to the repository
877 so that other people can get them.
883 It's worth noticing here how important Wolfman's comments about his changes were.
884 It's hard to see the difference between "Calisto" with one 'L' and "Callisto" with two,
885 even if the line containing the difference has been highlighted.
886 Without Wolfman's comments,
887 Dracula might have wasted time wondering what the difference was.
892 Wolfman should probably have committed his two changes separately,
893 since there's no logical connection between
894 fixing a typo in Callisto's name
895 and adding information about Amalthea to the same file.
896 Just as a function or program should do one job and one job only,
897 a single commit to version control should have a single logical purpose so that it's easier to find,
899 and if necessary undo later on.
903 <h3>Who Did What?</h3>
906 One other very useful command is <code>svn blame</code>,
907 which shows when each line in the file was last changed
912 $ <span class="in">svn blame moons.txt</span>
913 <span class="out"> 14 dracula Name Orbital Radius Orbital Period Mass Radius
914 14 dracula (10**3 km) (days) (10**20 kg) (km)
915 14 dracula Amalthea 181.4 0.498179 0.075 131 x 73 x 67
916 9 mummy Io 421.6 1.769138 893.2 1821.6
917 9 mummy Europa 670.9 3.551181 480.0 1560.8
918 9 mummy Ganymede 1070.4 7.154553 1481.9 2631.2
919 14 dracula Callisto 1882.7 16.689018 1075.9 2410.3
920 14 dracula Himalia 11460 250.5662 0.095 85.0
921 14 dracula Elara 11740 259.6528 0.008 40.0</span>
925 If you are ever wondering who to talk to about a change,
927 <code>svn blame</code> is a good place to start.
931 <div class="keypoints">
934 <li>Version control is a better way to manage shared files than email or shared folders.</li>
935 <li>The master copy is stored in a repository.</li>
936 <li>Nobody ever edits the master directory: instead, each person edits a local working copy.</li>
937 <li>People share changes by committing them to the master or updating their local copy from the master.</li>
938 <li>The version control system prevents people from overwriting each other's work by forcing them to merge concurrent changes before committing.</li>
939 <li>It also keeps a complete history of changes made to the master so that old versions can be recovered reliably.</li>
940 <li>Version control systems work best with text files, but can also handle binary files such as images and Word documents.</li>
941 <li>Every repository is identified by a URL.</li>
942 <li>Working copies of different repositories may not overlap.</li>
943 <li>Each changed to the master copy is identified by a unique revision number.</li>
944 <li>Revisions identify snapshots of the entire repository, not changes to individual files.</li>
945 <li>Each change should be commented to make the history more readable.</li>
946 <li>Commits are transactions: either all changes are successfully committed, or none are.</li>
947 <li>The basic workflow for version control is update-change-commit.</li>
948 <li><code>svn add <em>things</em></code> tells Subversion to start managing particular files or directories.</li>
949 <li><code>svn checkout <em>url</em></code> checks out a working copy of a repository.</li>
950 <li><code>svn commit -m "<em>message</em>" <em>things</em></code> sends changes to the repository.</li>
951 <li><code>svn diff</code> compares the current state of a working copy to the state after the most recent update.</li>
952 <li><code>svn diff -r HEAD</code> compares the current state of a working copy to the state of the master copy.</li>
953 <li><code>svn history</code> shows the history of a working copy.</li>
954 <li><code>svn status</code> shows the status of a working copy.</li>
955 <li><code>svn update</code> updates a working copy from the repository.</li>
959 <div class="challenges">
965 Using the repository URL, user ID, and password provided by the instructor,
966 perform the following actions:
969 Check out a working copy of the repository.
972 Create a text file called <em>your_id</em>.txt
973 (using your user ID instead of <em>your_id</em>)
974 and write a three-line biography of yourself in it.
977 Add this file to your working copy.
980 Commit your changes to the repository.
983 Update your working copy to get other people's biographies.
986 Examine the change log to see
987 the order in which people added their biographies
994 What does the command <code>svn diff -r 14</code> do?
995 What does it do if there have only been 10 changes to the repository?
1000 Unix <code>diff</code> and <code>svn diff</code> compare files line by line.
1001 Why doesn't this work for MP3 audio files?
1009 <section id="s:merge">
1010 <h2>Merging Conflicts</h2>
1012 <div class="understand">
1013 <h3>Learning Objectives</h3>
1015 <li>Explain what causes conflicts to occur and how to tell when one has occurred.</li>
1016 <li>Resolve a conflict.</li>
1017 <li>Identify the auxiliary files created when a conflict occurs.</li>
1020 <span class="duration">20 minutes</span>.
1025 Dracula and Wolfman have both synchronized their working copies of <code>explore</code>
1026 with version 8 of the repository.
1027 Dracula now edits his copy to change Amalthea's radius
1028 from a single number to a triple to reflect its irregular shape:
1031 <pre src="svn/moons_dracula_triple.txt">
1032 Name Orbital Radius Orbital Period Mass Radius
1033 <span class="highlight">Amalthea 181.4 0.498179 0.075 131 x 73 x 67</span>
1034 Io 421.6 1.769138 893.2 1821.6
1035 Europa 670.9 3.551181 480.0 1560.8
1036 Ganymede 1070.4 7.154553 1481.9 2631.2
1037 Callisto 1882.7 16.689018 1075.9 2410.3
1040 <p class="continue">
1041 He then commits his work,
1042 creating revision 9 of the repository
1043 (<a href="#f:after_dracula_commits">Figure 9</a>).
1046 <figure id="f:after_dracula_commits">
1047 <img src="svn/after_dracula_commits.png" alt="After Dracula Commits" />
1048 <figcaption>Figure 9: After Dracula Commits</figcaption>
1052 But while he is doing this,
1053 Wolfman is editing <em>his</em> copy
1054 to add information about two other minor moons,
1058 <pre src="svn/moons_wolfman_extras.txt">
1059 Name Orbital Radius Orbital Period Mass Radius
1060 Amalthea 181.4 0.498179 0.075 131
1061 Io 421.6 1.769138 893.2 1821.6
1062 Europa 670.9 3.551181 480.0 1560.8
1063 Ganymede 1070.4 7.154553 1481.9 2631.2
1064 Callisto 1882.7 16.689018 1075.9 2410.3
1065 <span class="highlight">Himalia 11460 250.5662 0.095 85.0
1066 Elara 11740 259.6528 0.008 40.0</span>
1070 When Wolfman tries to commit his changes to the repository,
1071 Subversion won't let him:
1075 $ <span class="in">svn commit -m "Added data for Himalia, Elara"</span>
1076 <span class="out">Sending jupiter/moons.txt
1077 svn: Commit failed (details follow):
1078 svn: File or directory 'moons.txt' is out of date; try updating
1079 svn: resource out of date; try updating</span>
1082 <p class="continue">
1084 Wolfman's changes were based on revision 8,
1085 but the repository is now at revision 9,
1086 and the file that Wolfman is trying to overwrite
1087 is different in the later revision.
1089 one of version control's main jobs is to make sure that
1090 people don't trample on each other's work.)
1091 Wolfman has to update his working copy to get Dracula's changes before he can commit.
1093 Dracula edited a line that Wolfman didn't change,
1094 so Subversion can merge the differences automatically.
1098 This does <em>not</em> mean that Wolfman's changes have been committed to the repository:
1099 Subversion only does that when it's ordered to.
1100 Wolfman's changes are still in his working copy,
1101 and <em>only</em> in his working copy.
1102 But since Wolfman's version of the file now includes
1103 the lines that Dracula added,
1104 Wolfman can go ahead and commit them as usual to create revision 10
1105 (<a href="#f:merge_without_conflict">Figure 10</a>).
1108 <figure id="f:merge_without_conflict">
1109 <img src="svn/merge_without_conflict.png" alt="Merging Without Conflict" />
1110 <figcaption>Figure 10: Merging Without Conflict</figcaption>
1114 Wolfman's working copy is now in sync with the master,
1115 but Dracula's is one behind at revision 9.
1117 they independently decide to add measurement units
1118 to the columns in <code>moons.txt</code>.
1119 Wolfman is quicker off the mark this time;
1120 he adds a line to the file:
1123 <pre src="svn/moons_wolfman_units.txt">
1124 Name Orbital Radius Orbital Period Mass Radius
1125 <span class="highlight"> (10**3 km) (days) (10**20 kg) (km)</span>
1126 Amalthea 181.4 0.498179 0.075 131 x 73 x 67
1127 Io 421.6 1.769138 893.2 1821.6
1128 Europa 670.9 3.551181 480.0 1560.8
1129 Ganymede 1070.4 7.154553 1481.9 2631.2
1130 Callisto 1882.7 16.689018 1075.9 2410.3
1131 Himalia 11460 250.5662 0.095 85.0
1132 Elara 11740 259.6528 0.008 40.0
1135 <p class="continue">
1136 and commits it to create revision 11.
1137 While he is doing this,
1139 Dracula inserts a different line at the top of the file:
1142 <pre src="svn/moons_dracula_units.txt">
1143 Name Orbital Radius Orbital Period Mass Radius
1144 <span class="highlight"> * 10^3 km * days * 10^20 kg * km</span>
1145 Amalthea 181.4 0.498179 0.075 131 x 73 x 67
1146 Io 421.6 1.769138 893.2 1821.6
1147 Europa 670.9 3.551181 480.0 1560.8
1148 Ganymede 1070.4 7.154553 1481.9 2631.2
1149 Callisto 1882.7 16.689018 1075.9 2410.3
1150 Himalia 11460 250.5662 0.095 85.0
1151 Elara 11740 259.6528 0.008 40.0
1156 when Dracula tries to commit,
1157 Subversion tells him he can't.
1159 when Dracula does updates his working copy,
1160 he doesn't just get the line Wolfman added to create revision 11
1161 (<a href="#f:merge_with_conflict">Figure 11</a>).
1164 <figure id="f:merge_with_conflict">
1165 <img src="svn/merge_with_conflict.png" alt="Merge With Conflict" />
1166 <figcaption>Figure 11: Merge With Conflict</figcaption>
1170 There is an actual conflict in the file,
1171 so Subversion asks Dracula what he wants to do:
1174 <pre src="svn/moons_dracula_conflict.txt">
1175 $ <span class="in">svn update</span>
1176 <span class="out">Conflict discovered in 'jupiter/moons.txt'.
1177 Select: (p) postpone, (df) diff-full, (e) edit,
1178 (mc) mine-conflict, (tc) theirs-conflict,
1179 (s) show all options:</span>
1183 Dracula choose <code>p</code> for "postpone",
1184 which tells Subversion that he'll deal with the problem later.
1185 Once the update is finished,
1186 he opens <code>moons.txt</code> in his editor and sees:
1190 Name Orbital Radius Orbital Period Mass
1191 +<<<<<<< .mine
1192 + * 10^3 km * days * 10^20 kg
1194 + (10**3 km) (days) (10**20 kg)
1195 +>>>>>>> .r11
1196 Amalthea 181.4 0.498179 0.074
1197 Io 421.6 1.769138 893.2
1198 Europa 670.9 3.551181 480.0
1199 Ganymede 1070.4 7.154553 1481.9
1200 Callisto 1882.7 16.689018 1075.9
1203 <p class="continue">
1205 Subversion has inserted
1206 <a href="glossary.html#conflict-marker">conflict markers</a>
1207 in <code>moons.txt</code>
1208 wherever there is a conflict.
1209 The line <code><<<<<<< .mine</code> shows the start of the conflict,
1210 and is followed by the lines from the local copy of the file.
1211 The separator <code>=======</code> is then
1212 followed by the lines from the repository's file that are in conflict with that section,
1213 while <code>>>>>>>> .r11</code> marks the end of the conflict.
1217 Before he can commit,
1218 Dracula has to edit his copy of the file to get rid of those markers.
1222 <pre src="svn/moons_dracula_resolved.txt">
1223 Name Orbital Radius Orbital Period Mass Radius
1224 <span class="highlight"> (10^3 km) (days) (10^20 kg) (km)</span>
1225 Amalthea 181.4 0.498179 0.075 131 x 73 x 67
1226 Io 421.6 1.769138 893.2 1821.6
1227 Europa 670.9 3.551181 480.0 1560.8
1228 Ganymede 1070.4 7.154553 1481.9 2631.2
1229 Callisto 1882.7 16.689018 1075.9 2410.3
1230 Himalia 11460 250.5662 0.095 85.0
1231 Elara 11740 259.6528 0.008 40.0
1234 <p class="continue">
1235 then uses the <code>svn resolved</code> command to tell Subversion that
1236 he has fixed the problem.
1237 Subversion will now let him commit to create revision 12.
1241 <h3>Auxiliary Files</h3>
1244 When Dracula did his update and Subversion detected the conflict in <code>moons.txt</code>,
1245 it created three temporary files to help Dracula resolve it.
1246 The first is called <code>moons.txt.r9</code>;
1247 it is the file as it was in Dracula's local copy
1248 before he started making changes,
1249 i.e., the common ancestor for his work
1250 and whatever he is in conflict with.
1254 The second file is <code>moons.txt.r11</code>.
1255 This is the most up-to-date revision from the repository—the
1256 file as it is including Wolfman's changes.
1257 The third temporary file, <code>moons.txt.mine</code>,
1258 is the file as it was in Dracula's working copy before he did the Subversion update.
1262 Subversion creates these auxiliary files primarily
1263 to help people merge conflicts in binary files.
1264 It wouldn't make sense to insert <code><<<<<<<</code>
1265 and <code>>>>>>>></code> characters into an image file
1266 (it would almost certainly result in a corrupted image).
1267 The <code>svn resolved</code> command deletes these three extra files
1268 as well as telling Subversion that the conflict has been taken care of.
1274 Some power users prefer to work with interpolated conflict markers directly,
1275 but for the rest of us,
1276 there are several tools for displaying differences and helping to merge them,
1277 including <a href="http://diffuse.sourceforge.net/">Diffuse</a> and <a href="http://winmerge.org/">WinMerge</a>.
1278 If Dracula launches Diffuse,
1279 it displays his file,
1280 the common base that he and Wolfman were working from,
1281 and Wolfman's file in a three-pane view
1282 (<a href="#f:diff_viewer">Figure 12</a>):
1285 <figure id="f:diff_viewer">
1286 <img src="svn/diff_viewer.png" alt="A Difference Viewer" />
1287 <figcaption>Figure 12: A Difference Viewer</figcaption>
1290 <p class="continue">
1291 Dracula can use the buttons to merge changes from either of the edited versions
1292 into the common ancestor,
1293 or edit the central pane directly.
1296 he uses <code>svn resolved</code> and <code>svn commit</code>
1297 to create revision 12 of the repository.
1301 In this case, the conflict was small and easy to fix.
1302 However, if two or more people on a team are repeatedly creating conflicts for one another,
1303 it's usually a signal of deeper communication problems:
1304 either they aren't talking as often as they should, or their responsibilities overlap.
1306 the version control system can help the team find and fix these issues
1307 so that it will be more productive in future.
1311 <h3>Working With Multiple Files</h3>
1314 As mentioned <a href="#a:transaction">earlier</a>,
1315 every logical change to a project should result in a single commit,
1316 and every commit should represent one logical change.
1317 This is especially true when resolving conflicts:
1318 the work done to reconcile one person's changes with another are often complicated,
1319 so it should be a single entry in the project's history,
1320 with other, later, changes coming after it.
1325 <div class="keypoints">
1328 <li>Conflicts must be resolved before a commit can be completed.</li>
1329 <li>Subversion puts markers in text files to show regions of conflict.</li>
1330 <li>For each conflicted file, Subversion creates auxiliary files containing the common parent, the master version, and the local version.</li>
1331 <li><code>svn resolve <em>files</em></code> tells Subversion that conflicts have been resolved.</li>
1335 <div class="challenges">
1339 If you are working in a group,
1340 partner with someone who has also wrote a biography for themselves
1341 for the previous section's challenges.
1346 Both partners use <code>svn update</code>
1347 to make sure their working copies are up to date
1348 and that there are no local changes.
1351 The first partner edits her biography and commits the changes.
1354 The second partner edits her copy of the file
1355 (<em>without</em> having updated to get the first partner's changes),
1356 then tries to <code>svn commit</code>.
1359 Once the second partner has resolved the conflict,
1360 she commits her changes.
1363 Repeat these four steps with roles reversed.
1368 If you are working on your own,
1369 you can simulate the steps above
1370 by checking out a second copy of the project into a new directory.
1372 this cannot overlap any existing checked-out copies.)
1373 Edit your biography in one copy and commit those changes,
1374 then switch to the other copy and edit the same file
1381 <section id="s:rollback">
1382 <h2>Recovering Old Versions</h2>
1384 <div class="understand">
1385 <h3>Learning Objectives</h3>
1387 <li>Discard changes made to a working copy.</li>
1388 <li>Recover an old version of a file.</li>
1389 <li>Explain what branches are and when they are used.</li>
1392 <span class="duration">20 minutes</span>.
1397 Now that we have seen how to merge files and resolve conflicts,
1398 we can look at how to use version control as an "infinite undo".
1399 Suppose that when Wolfman starts work late one night,
1400 his copy of <code>explore</code> is in sync with the head at revision 12.
1401 He decides to edit the file <code>moons.txt</code>;
1402 unfortunately, he forgot that there was a full moon,
1403 so his changes don't make a lot of sense:
1406 <pre src="svn/poetry.txt">
1407 Just one moon can make me growl
1408 Four would make me want to howl
1413 When he's back in human form the next day,
1414 he wants to undo his changes.
1415 Without version control, his choices would be grim:
1416 he could try to edit them back into their original state by hand
1417 (which for some reason hardly ever seems to work),
1418 or ask his colleagues to send him their copies of the files
1419 (which is almost as embarrassing as chasing the neighbor's cat when in wolf form).
1423 Since he's using Subversion, though,
1424 and hasn't committed his work to the repository,
1425 all he has to do is <a href="glossary.html#revert">revert</a> his local changes.
1426 <code>svn revert</code> simply throws away local changes to files
1427 and puts things back the way they were before those changes were made.
1428 This is a purely local operation:
1429 since Subversion stores the history of the project inside every working copy,
1430 Wolfman doesn't need to be connected to the network to do this.
1435 Wolfman uses <code>svn diff</code> <em>without</em> the <code>-r HEAD</code> flag
1436 to take a look at the differences between his file
1437 and the master copy in the repository.
1438 Since he doesn't want to keep his changes,
1439 his next command is <code>svn revert moons.txt</code>.
1443 $ <span class="in">cd jupiter</span>
1444 $ <span class="in">svn revert moons.txt</span>
1445 <span class="out">Reverted moons.txt</span>
1449 What if someone <em>has</em> committed their changes,
1450 but still wants to undo them?
1452 suppose Dracula decides that the numbers in <code>moons.txt</code> would look better with commas.
1453 He edits the file to put them in:
1456 <pre src="svn/moons_commas.txt">
1457 Name Orbital Radius Orbital Period Mass Radius
1458 (10^3 km) (days) (10^20 kg) (km)
1459 Amalthea 181.4 0.498179 0.075 131 x 73 x 67
1460 Io 421.6 1.769138 893.2 1<span class="highlight">,</span>821.6
1461 Europa 670.9 3.551181 480.0 1<span class="highlight">,</span>560.8
1462 Ganymede 1<span class="highlight">,</span>070.4 7.154553 1<span class="highlight">,</span>481.9 2<span class="highlight">,</span>631.2
1463 Callisto 1<span class="highlight">,</span>882.7 16.689018 1<span class="highlight">,</span>075.9 2<span class="highlight">,</span>410.3
1464 Himalia 11<span class="highlight">,</span>460 250.5662 0.095 85.0
1465 Elara 11<span class="highlight">,</span>740 259.6528 0.008 40.0
1468 <p class="continue">
1469 then commits his changes to create revision 13.
1470 A little while later,
1471 the Mummy sees the change and orders Dracula to put things back the way they were.
1472 What should Dracula do?
1476 We can draw the sequence of events leading up to revision 13
1477 as shown in <a href="#f:before_undoing">Figure 13</a>:
1480 <figure id="f:before_undoing">
1481 <img src="svn/before_undoing.png" alt="Before Undoing" />
1482 <figcaption>Figure 13: Before Undoing</figcaption>
1485 <p class="continue">
1486 Dracula wants to erase revision 13 from the repository,
1487 but he can't actually do that:
1488 once a change is in the repository,
1490 What he can do instead is merge the old revision with the current revision
1491 to create a new revision
1492 (<a href="#f:merging_history">Figure 14</a>).
1495 <figure id="f:merging_history">
1496 <img src="svn/merging_history.png" alt="Merging History" />
1497 <figcaption>Figure 14: Merging History</figcaption>
1500 <p class="continue">
1501 This is exactly like merging changes made by two different people;
1502 the only difference is that the "other person" is his past self.
1507 Dracula must merge revision 12 (the one before his change)
1508 with revision 13 (the current head revision)
1509 using <code>svn merge</code>:
1513 $ <span class="in">svn merge -r HEAD:12 moons.txt</span>
1514 <span class="out">-- Reverse-merging r13 into 'moons.txt'
1518 <p class="continue">
1519 The <code>-r</code> flag specifies the range of revisions to merge:
1520 to undo the changes from revision 12 to revision 13,
1521 he uses either <code>13:12</code> or <code>HEAD:12</code>
1522 (since he is going backward in time from the most recent revision to revision 12).
1523 This is called a <a href="glossary.html#reverse-merge">reverse</a> merge
1524 because he's going backward in time.
1528 After he runs this command,
1529 he must run <code>svn commit</code> to save the changes to the repository.
1530 This creates a new revision, number 14,
1531 rather than erasing revision 13.
1533 the changes he made to create revision 13 are still there
1534 if he can ever convince the Mummy that numbers should have commas.
1538 <h3>Another Way to Do It</h3>
1541 Another way to recover a particular version of a particular file
1542 is to use the <code>svn copy</code> command.
1543 If the URL of our repository is
1544 <code>https://universal.software-carpentry.org/explore</code>,
1549 $ <span class="in">svn copy https://universal.software-carpentry.org/explore/mission.txt@120 ./mission.txt</span>
1552 <p class="continue">
1553 copies the file <code>mission.txt</code> as it was in revision 120
1554 into our working directory
1555 (overwriting whatever <code>mission.txt</code> file we currently have,
1558 using <code>svn copy</code> brings along the file's history as well,
1559 so that future <code>svn log</code> operations will show
1560 how <code>mission.txt</code> was resurrected.
1565 Merging can be used to recover older revisions of files,
1566 not just the most recent,
1567 and to recover many files or directories at a time.
1568 The most frequent use, though,
1569 is to manage parallel streams of development in large projects.
1570 This is outside the scope of this chapter,
1571 but the basic idea is simple.
1575 Suppose that Universal Missions has just released a new program
1576 for designing interplanetary voyages.
1577 Dracula and Wolfman are supposed to add some features
1578 that were left out of the first release because time ran short.
1580 Frankenstein and the Mummy are doing technical support:
1581 their job is to fix any bugs that users find.
1585 All sorts of things could go wrong
1586 if both teams tried to work on the same code at the same time.
1588 Dracula and Wolfman might want to make large changes
1589 to the structure of the code
1590 in order to make it easier to add new features,
1591 while Frankenstein and the Mummy want to make as few changes as possible
1592 so as not to introduce new bugs while fixing old ones.
1596 The usual way to handle this situation is
1597 to create a <a href="glossary.html#branch">branch</a>
1598 in the repository for each major sub-project
1599 (<a href="#f:branch_merge">Figure 15</a>).
1600 While Wolfman and Dracula work on
1601 the <a href="glossary.html#main-line">main line</a>,
1602 Frankenstein and the Mummy create a branch,
1603 which is just another copy of the repository's files and directories
1604 that is also under version control.
1605 They can work in their branch without disturbing Wolfman and Dracula and vice versa:
1608 <figure id="f:branch_merge">
1609 <img src="svn/branch_merge.png" alt="Branching and Merging" />
1610 <figcaption>Figure 15: Branching and Merging</figcaption>
1614 Branches in version control repositories are often described as "parallel universes".
1615 Each branch starts off as a clone of the project at some moment in time
1616 (typically each time the software is released,
1617 or whenever work starts on a major new feature).
1618 Changes made to a branch only affect that branch,
1619 just as changes made to the files in one directory don't affect files in other directories.
1621 the branch and the main line are both stored in the same repository,
1622 so their revision numbers are always in step.
1626 If someone decides that a bug fix in one branch should also be made in another,
1627 all they have to do is merge the files in question.
1628 This is exactly like merging an old version of a file with the current one,
1629 but instead of going backward in time,
1630 the change is brought sideways from one branch to another.
1634 Branching helps projects scale up by letting sub-teams work independently,
1635 but too many branches can cause as many problems as they solve.
1636 Karl Fogel's excellent book
1637 <a href="bib.html#fogel-producing-oss"><cite>Producing Open Source Software</cite></a>,
1638 and Laura Wingerd and Christopher Seiwald's paper
1639 "<a href="bib.html#wingerd-seiwald-scm">High-level Best Practices in Software Configuration Management</a>",
1640 talk about branches in much more detail.
1641 Projects usually don't need to do this until they have a dozen or more developers,
1642 or until several versions of their software are in simultaneous use,
1643 but using branches is a key part of switching from software carpentry to software engineering.
1646 <div class="keypoints">
1649 <li>Old versions of files can be recovered by merging their old state with their current state.</li>
1650 <li>Recovering an old version of a file does not erase the intervening changes.</li>
1651 <li>Use branches to support parallel independent development.</li>
1652 <li><code>svn revert</code> undoes local changes to files.</li>
1653 <li><code>svn merge</code> merges two revisions of a file.</li>
1657 <div class="challenges">
1662 Explain what the command:
1664 svn diff -r 240:261 fish.dat
1666 does, and when you might want to run it.
1670 Suppose that a file called <code>mission.txt</code>
1671 existed in revision 90 of a repository,
1672 but had been deleted in revision 91.
1673 What two commands could we use to recover it?
1681 <section id="s:setup">
1682 <h2>Setting Up a Repository</h2>
1684 <div class="understand">
1685 <h3>Learning Objectives</h3>
1687 <li>How to create a repository.</li>
1690 <span class="duration">25 minutes</span>
1691 (mostly discussion about where to host repositories).
1696 It is finally time to see how to create a repository.
1698 we will keep the master copy of our work in a repository
1699 on a server that we can access from other machines on the internet.
1700 That master copy consists of files and directories that no-one ever edits directly.
1701 Instead, a copy of Subversion running on that machine
1702 manages updates for us and watches for conflicts.
1703 Our working copy is a mirror image of the master sitting on our computer.
1704 When our Subversion client needs to communicate with the master,
1705 it exchanges data with the copy of Subversion running on the server.
1709 To make this to work, we need four things:
1715 The repository itself.
1716 It's not enough to create an empty directory and start filling it with files:
1717 Subversion needs to create a lot of other structure
1718 in order to keep track of old revisions, who made what changes, and so on.
1722 The full URL of the repository.
1723 This includes the URL of the server
1724 and the path to the repository on that machine.
1725 (The second part is needed because a single server can,
1727 host many repositories.)
1731 Permission to read or write the master copy.
1732 Many open source projects give the whole world permission to read from their repository,
1733 but very few allow strangers to write to it:
1734 there are just too many possibilities for abuse.
1735 Somehow, we have to set up a password or something like it
1736 so that users can prove who they are.
1740 A working copy of the repository on our computer.
1741 Once the first three things are in place,
1742 this just means running the <code>checkout</code> command.
1748 To keep things simple,
1749 we will start by creating a repository on the machine that we're working on.
1750 This won't let us share our work with other people,
1751 but it <em>will</em> allow us to save the history of our work as we go along.
1755 The command to create a repository is <code>svnadmin create</code>,
1756 followed by the path to the repository.
1757 If we want to create a repository called <code>missions_repo</code>
1758 directly under our home directory,
1759 we just <code>cd</code> to get home
1760 and run <code>svnadmin create missions_repo</code>.
1761 This command creates a directory called <code>missions_repo</code> to hold our repository,
1762 and fills it with various files that Subversion uses
1763 to keep track of the project's history:
1767 $ <span class="in">cd</span>
1768 $ <span class="in">svnadmin create missions_repo</span>
1769 $ <span class="in">ls -F missions_repo</span>
1770 <span class="out">README.txt conf/ db/ format hooks/ locks/</span>
1773 <p class="continue">
1774 We should <em>never</em> edit any of this directly,
1775 since it will almost certainly make the repository unusable.
1777 we should use <code>svn checkout</code>
1778 to get a working copy of this repository.
1779 If our home directory is <code>/users/mummy</code>,
1780 then the full path to the repository we just created is <code>/users/mummy/missions_repo</code>,
1781 so we run <code>svn checkout file:///users/mummy/missions missions_working</code>.
1786 the second argument,
1787 <code>missions_working</code>,
1788 specifies where the working copy is to be put.
1789 The first argument is the URL of our repository,
1790 and it has two parts.
1791 <code>/users/mummy/missions_repo</code> is the path to repository directory.
1792 <code>file://</code> specifies the <a href="glossary.html#protocol">protocol</a>
1793 that Subversion will use to communicate with the repository—in this case,
1794 it says that the repository is part of the local machine's filesystem.
1795 (Notice that the protocol ends in two slashes,
1796 while the absolute path to the repository starts with a slash,
1797 making three in total.
1798 A very common mistake is to type only two, since that's what web URLs normally have.)
1802 When we're doing a checkout,
1803 it is <em>very</em> important that we provide the second argument,
1804 which specifies the name of the directory we want the working copy to be put in.
1806 Subversion will try to use the name of the repository,
1807 <code>missions_repo</code>,
1808 as the name of the working copy.
1809 Since we're in the directory that contains the repository,
1810 this means that Subversion will try to overwrite the repository with a working copy.
1812 there isn't much risk of our sanity being torn to shreds,
1813 but this could ruin our repository.
1817 To avoid this problem,
1818 most people create a sub-directory in their account called something like <code>repos</code>,
1819 and then create their repositories in that.
1821 we could create our repository in <code>/users/mummy/repos/missions</code>,
1822 then check out a working copy as <code>/users/mummy/missions</code>.
1823 This practice makes both names easier to read.
1827 The obvious next step is to put our repository on a server,
1828 rather than on our personal machine.
1830 we should <em>always</em> do this
1831 so that we don't lose the history of our project
1832 if our laptop is damaged or stolen.
1833 A departmental server is also much more likely to be backed up regularly
1834 than our personal machine…
1838 Creating a repository on a server is simple:
1839 just log in and go through the steps described above.
1840 Accessing that repository from another machine
1841 is also straightforward.
1842 If the machine's address is <code>serv.euphoric.edu</code>,
1843 and our user ID is <code>dracula</code>,
1844 the URL of the repository will be something like:
1848 svn+ssh://dracula@serv.euphoric.edu/home/dracula/repos/missions
1852 Reading from left to right:
1857 <code>svn+ssh</code> is the protocol that Subversion uses to connect to the server
1859 a combination of Subversion's own protocol
1860 and <a href="shell.html#s:ssh">SSH</a>);
1863 <code>dracula@serv.euphoric.edu</code> identifies the server and who we are
1864 (just like an email address);
1868 <code>/home/dracula/repos/missions</code> is the absolutely path of the repository
1873 <p id="a:only_user">
1874 That's fine if you are the only person using the repository,
1875 but if you want to share it with others,
1876 you need to worry about security.
1877 As we discuss in the lesson on <a href="web.html">web programming</a>,
1878 as soon as you provide a service on the internet,
1879 there's the possibility that someone may try to attack your system through it.
1880 Rather than trying to learn enough system administration skills
1881 to set things up safely,
1882 it is usually easier to:
1888 ask your department's system administrator to set it up for you;
1892 use a hosting service like <a href="http://www.sf.net">SourceForge</a>,
1893 <a href="http://code.google.com">Google Code</a>,
1894 <a href="https://github.com/">GitHub</a>,
1895 or <a href="https://bitbucket.org/">BitBucket</a>; or
1899 spend a few dollars a month on a commercial hosting service
1900 that provides web-based GUIs for creating and managing repositories.
1906 If you choose the second or third option,
1907 please check with whoever handles intellectual property at your institution
1908 to make sure that putting your work on a commercially-operated machine
1909 that is probably in some other legal jurisdiction
1910 isn't going to cause trouble.
1911 Many people assume that it's "just OK",
1912 while others act as if not having asked will be an acceptable defence later on.
1914 neither is true…
1917 <div class="keypoints">
1920 <li><code>svnadmin create <em>name</em></code> creates a new repository.</li>
1921 <li>Repositories can be hosted locally, on local (departmental) servers, on hosting services, or on their owners' own domains.</li>
1925 <div class="challenges">
1931 Create a Subversion repository called <code>trials_repo</code>
1932 in your home directory.
1933 Check out a working copy in a directory called <code>trials_working</code>
1934 (also in your home directory).
1935 Add a couple of text files,
1937 and then use <code>svn info trials_working</code>
1938 to see what Subversion tells you about your working copy.
1942 We said <a href="#a:only_user">above</a> that
1943 you might be the only person using a particular repository.
1944 When and why is version control worth using
1945 if no-one else is working on a project with you?
1949 There are many ways to organize repositories.
1950 Some of the most common are to create one repository for:
1952 <li>each person</li>
1954 <li>all the work done on one grant</li>
1955 <li>all the work done on one project</li>
1956 <li>the entire lab (which is shared by everyone in the lab)</li>
1957 <li>the entire department (typically with a top-level directory for each person or project in the department)</li>
1959 What activities does each one make easy or hard?
1960 Which of these would you prefer, and why?
1968 <section id="s:provenance">
1971 <div class="understand">
1972 <h3>Learning Objectives</h3>
1974 <li>What data provenance is.</li>
1975 <li>How to embed version numbers and other information in files managed by version control.</li>
1976 <li>How to record version information about a program in its output.</li>
1979 <span class="duration">20 minutes</span>
1980 (without a practical exercise).
1986 the <a href="glossary.html#provenance">provenance</a> of a work
1987 is the history of who owned it, when, and where.
1989 it's the record of how a particular result came to be:
1990 what raw data was processed by what version of what program to create which intermediate files,
1991 what was used to turn those files into which figures of which papers,
1996 One of the big benefits of using version control is that
1997 it lets us track the provenance of scientific data automatically.
1999 suppose we have a text file <code>combustion.dat</code> in a Subversion repository.
2000 Run the following two commands:
2004 $ svn propset svn:keywords Revision combustion.dat
2005 $ svn commit -m "Turning on the 'Revision' keyword" combustion.dat
2008 <p class="continue">
2009 This does nothing by itself,
2010 but now open the file in an editor
2011 and add the following line somewhere near the top:
2019 The <code>$Revision:$</code> string means something special to Subversion.
2020 Save the file, and commit the change:
2024 $ svn commit -m "Inserting the 'Revision' keyword" combustion.dat
2028 When we open the file again,
2029 we'll see that Subversion has changed that line to something like:
2036 <p class="continue">
2037 i.e., it has inserted the version number
2038 after the colon and before the closing <code>$</code>.
2039 If we edit the file again—e.g., add a couple of lines with random numbers—and
2041 the line is updated again to:
2049 Here's what just happened.
2050 First, Subversion allows uss to add
2051 <a href="glossary.html#property-subversion">properties</a>
2052 to files and and directories.
2053 These properties aren't stored in the files or directories themselves,
2054 but in Subversion's database.
2055 One of those properties,
2056 <code>svn:keywords</code>,
2057 tells Subversion to look in files that are being changed
2058 for strings of the form <code>$propertyname: …$</code>,
2059 where <code>propertyname</code> is a string like <code>Revision</code> or <code>Author</code>.
2060 (About half a dozen such strings are supported.)
2064 If it sees such a string,
2065 Subversion rewrites it as the commit is taking place to replace <code>…</code>
2066 with the current version number,
2067 the name of the person making the change,
2068 or whatever else the property's name tells it to do.
2069 We only have to add the string to the file once;
2071 Subversion updates it for you every time the file changes.
2075 Putting the version number in the file this way can be pretty handy.
2076 If you copy the file to another machine,
2078 it carries its version number with it,
2079 so you can tell which version you have even if it's outside version control.
2080 We'll see some more useful things we can do with this information <a href="python.html">later</a>.
2084 We can use this trick with shell scripts too,
2085 or with almost any other kind of program.
2086 Let's go back to Nelle Nemo's data processing from
2087 the lesson on the <a href="shell.html">shell</a>.
2088 Suppose she writes a shell script called <code>gooclean</code>
2089 to tidy up data files.
2090 Her first version looks like this:
2094 # gooclean: clean up a single data file
2095 goonorm -b 0 100 < $1 | goofilter -x --enlarge 2.0 > cleaned-$1
2098 <p class="continue">
2100 it runs <code>goonorm</code> and then <code>goofilter</code> with some fixed parameters
2101 and creates an output file called <code>cleaned-something.dat</code>
2102 (if the input file's name was <code>something.dat</code>).
2103 Assuming that '#' is the comment character for her output files,
2104 she could instead write:
2108 # gooclean: clean up a single data file
2109 <span class="highlight">echo "# gooclean $Revision:$" > cleaned-$1</span>
2110 goonorm -b 0 100 < $1 | goofilter -x --enlarge 2.0 <span class="highlight">>></span> cleaned-$1
2113 <p class="continue">
2114 then set the <code>svn:keywords</code> property
2115 and commit the file to insert the revision number,
2120 # gooclean: clean up a single data file
2121 <span class="highlight">echo "# gooclean $Revision: 487$" > cleaned-$1</span>
2122 goonorm -b 0 100 < $1 | goofilter -x --enlarge 2.0 <span class="highlight">>></span> cleaned-$1
2127 each time this script is run it will:
2134 # gooclean $Revision: 487$
2140 append whatever the pipline containing <code>goonorm</code> and <code>goofilter</code>
2141 would have put in the file originally.
2142 (The double redirection <code>>></code> means "append to" rather than "overwrite".)
2146 <p class="continue">
2148 the output of this shell script will always record
2149 exactly what version of the script produced it.
2150 This isn't enough to reproduce the output—we would need to record
2151 the version numbers of the input files and the <code>goonorm</code> and <code>goofilter</code> programs,
2152 and the values of the parameters those programs used
2153 in order to do that—but it's an important and useful first step.
2156 <div class="keypoints">
2159 <li><code>$Keyword: …$</code> in a file can be filled in with a property value each time the file is committed.</li>
2160 <li>Put version numbers in programs' output to establish provenance for data.</li>
2161 <li><code>svn propset svn:keywords <em>property</em> <em>files</em></code> tells Subversion to start filling in property values.</li>
2165 <div class="challenges">
2171 Add <code>$Id:$</code> to a file,
2172 use <code>svn propset</code> to set the corresponding property,
2173 and then commit a change to the file.
2174 What value does Subversion fill in for this keyword?
2175 When would you use this rather than <code>Revision</code> or <code>Author</code>?
2179 What does the <code>svn:ignore</code> property do when applied to a directory?
2180 When would you use it?
2189 <section id="s:summary">
2194 <a href="bib.html#mccullough-reproducibility">McCullough, McGeary, and Harrison</a>
2195 analyzed several years of
2196 the data and code archive of <cite>Journal of Money, Credit, and Banking</cite>,
2197 a prestigious journal with a mandatory archiving policy.
2198 Of 266 articles published during that time,
2199 193 were empirical and should have had data and code deposited in the archive.
2201 only 69 actually had anything in the archive;
2202 Excluding eleven articles that only had data,
2203 and seven that required software or other resources they did not have,
2204 McCullough et al. were only able to replicate 14 of the remaining 186 articles.
2205 This doesn't mean that the other 92% were wrong,
2206 but it does mean there is no practical way to tell.
2211 version control doesn't making computational research reproducible.
2212 It <em>does</em> help,
2214 and also eliminates the frustration and wasted time caused by
2215 trying to figure out which emailed copy of a file,
2216 or which of a dozen directories or USB drives,
2218 And while correlation doesn't imply causality,
2219 there is certainly a strong correlation between
2220 knowing enough about good computational practices to use version control
2221 and knowing how to do other things right as well.
2225 {% endblock content %}