X-Git-Url: http://git.tremily.us/?p=swc-version-control-svn.git;a=blobdiff_plain;f=svn.html;h=aa203067780a3d1a4b02ff2673c2d8f22aa3f0fb;hp=494145cdb8d66d3010f3654fda1c08e868d35197;hb=38cdffc2da3457380aa287b78a1c2a5067a1f2b0;hpb=0c566fca18049b40bae8ee8d5c26577c99b8dd7b diff --git a/svn.html b/svn.html index 494145c..aa20306 100644 --- a/svn.html +++ b/svn.html @@ -58,40 +58,118 @@ -
- Version control systems do have one important shortcoming. - While it is easy for them to find, display, and merge differences in text files, - images, MP3s, PDFs, or Microsoft Word or Excel files aren't stored as text—they - use specialized binary data formats. - Most version control systems don't know how to deal with these formats, - so all they can say is, "These files differ." - Reconciling those differences will probably require use of an auxiliary tool, - such as an audio editor - or Microsoft Word's "Compare and Merge" utility. -
-The rest of this chapter will explore how to use a popular open source version control system called Subversion. + It does not have all the features of some newer systems, + such as Git, + but it is still widely used, + and is simpler to pick up than those more advanced alternatives. + No matter which system you use, + the most important thing to learn is not the details of their more obscure commands, + but the workflow that they encourage.
explain
++ Version control is the most important practical skill we introduce. + As the last paragraph of the introduction above says, + the workflow matters more than the ins and outs of any particular tool. + By the end of 90 minutes, + the instructor should be able to get learners to chant, + "Update, edit, merge, commit," in unison, + and have them understand what those terms mean + and why that's a good way to structure their working day. +
+ ++ Provided there aren't network problems, + this entire lesson can be covered in 90 minutes. + The example at the end + showing how to use Subversion keywords to track provenance + is the "ah ha!" moment for many learners. + If time is short, + skip the material on recovering old versions of files + in order to get to this section instead. + (The fact that provenance is harder in Git, + both mechanically and conceptually, + is one reason to keep teaching Subversion.) +
prereq
+
+ Basic shell concepts and skills
+ (ls
, cd
, mkdir
,
+ editing files);
+ basic shell scripting
+ (for the discussion of provenance).
+
svn+ssh
protocol instead of HTTPS.
+ svn blame
+ is only compelling if a file has been edited by at least two people.
+ svn diff -x -w
is supposed to suppress differences in whitespace,
+ but we have found that it doesn't always work as advertised.
+ https://universal.software-carpentry.org/monsters
.
+ whose URL is https://universal.software-carpentry.org/explore
.
Every repository has an address like this that uniquely identifies the location of the master copy.
@@ -252,24 +330,24 @@
-$ svn checkout https://universal.software-carpentry.org/monsters +$ svn checkout https://universal.software-carpentry.org/explore
- This creates a new directory called monsters
+ This creates a new directory called explore
and fills it with a copy of the repository's contents
(Figure 6).
-A monsters/jupiter
-A monsters/mars
-A monsters/mars/mons-olympus.txt
-A monsters/mars/cydonia.txt
-A monsters/earth
-A monsters/earth/himalayas.txt
-A monsters/earth/antarctica.txt
-A monsters/earth/carlsbad.txt
+A explore/jupiter
+A explore/mars
+A explore/mars/mons-olympus.txt
+A explore/mars/cydonia.txt
+A explore/earth
+A explore/earth/himalayas.txt
+A explore/earth/antarctica.txt
+A explore/earth/carlsbad.txt
Checked out revision 6.
@@ -284,7 +362,7 @@ Checked out revision 6.
-$ cd monsters +$ cd explore $ ls earth jupiter mars $ ls * @@ -311,7 +389,7 @@ cydonia.txt mons-olympus.txt$ pwd -/home/vlad/monsters +/home/dracula/explore $ ls -a . .. .svn earth jupiter mars $ ls -F .svn @@ -368,7 +446,7 @@ Send the probe to Mons Olympus? the date the change was made, and whatever comment the user provided when the change was submitted. As we can see, - themonsters
project is currently at revision 6, + theexplore
project is currently at revision 6, and all changes so far have been made by the Mummy. @@ -478,6 +556,30 @@ Committed revision 7.Figure 8: Updated Repository +++When Not to Use Version Control
+ ++ Despite the rapidly decreasing cost of storage, + it is still possible to run out of disk space. + In some labs, + people can easy go through 2 TB/month if they're not careful. + Since version control tools usually store revisions in terms of lines, + with binary data files, + they end up essentially storing every revision separately. + This isn't that bad + (it's what we'd be doing anyway), + but it means version control isn't doing what it likes to do, + and the repository can get very large very quickly. + Another concern is that if very old data will no longer be used, + it can be nice to archive or delete old data files. + This is not possible if our data is version controlled: + information can only be added to a repository, + so it can only ever increase in size. +
+ +Back in his cubicle, Wolfman uses
svn update
to update his working copy. @@ -683,6 +785,22 @@ $ svn diff -r HEAD
+ Version control systems do have one important shortcoming. + While it is easy for them to find, display, and merge differences in text files, + images, MP3s, PDFs, or Microsoft Word or Excel files aren't stored as text—they + use specialized binary data formats. + Most version control systems don't know how to deal with these formats, + so all they can say is, "These files differ." + Reconciling those differences will probably require use of an auxiliary tool, + such as an audio editor + or Microsoft Word's "Compare and Merge" utility. +
+
+ One other very useful command is svn blame
,
+ which shows when each line in the file was last changed
+ and by whom:
+
+$ svn blame moons.txt + 14 dracula Name Orbital Radius Orbital Period Mass Radius + 14 dracula (10**3 km) (days) (10**20 kg) (km) + 14 dracula Amalthea 181.4 0.498179 0.075 131 x 73 x 67 + 9 mummy Io 421.6 1.769138 893.2 1821.6 + 9 mummy Europa 670.9 3.551181 480.0 1560.8 + 9 mummy Ganymede 1070.4 7.154553 1481.9 2631.2 + 14 dracula Callisto 1882.7 16.689018 1075.9 2410.3 + 14 dracula Himalia 11460 250.5662 0.095 85.0 + 14 dracula Elara 11740 259.6528 0.008 40.0 ++ +
+ If you are ever wondering who to talk to about a change,
+ or why it was made,
+ svn blame
is a good place to start.
+
- Dracula and Wolfman have both synchronized their working copies of monsters
- with version 8 of the repository.
- Dracula now edits his copy to change Amalthea's radius
- from a single number to a triple to reflect its irregular shape:
-
+ Dracula and Wolfman have both synchronized their working copies of explore
+ with version 8 of the repository.
+ Dracula now edits his copy to change Amalthea's radius
+ from a single number to a triple to reflect its irregular shape:
+
Name Orbital Radius Orbital Period Mass Radius @@ -883,22 +1030,23 @@ Ganymede 1070.4 7.154553 1481.9 2631.2 Callisto 1882.7 16.689018 1075.9 2410.3-
- He then commits his work, - creating revision 9 of the repository - (Figure XXX). -
++ He then commits his work, + creating revision 9 of the repository + (Figure 9). +
- + -- But while he is doing this, - Wolfman is editing his copy - to add information about two other minor moons, - Himalia and Elara: -
++ But while he is doing this, + Wolfman is editing his copy + to add information about two other minor moons, + Himalia and Elara: +
Name Orbital Radius Orbital Period Mass Radius @@ -911,10 +1059,10 @@ Callisto 1882.7 16.689018 1075.9 2410.3 Elara 11740 259.6528 0.008 40.0-
- When Wolfman tries to commit his changes to the repository, - Subversion won't let him: -
++ When Wolfman tries to commit his changes to the repository, + Subversion won't let him: +
$ svn commit -m "Added data for Himalia, Elara"
@@ -924,40 +1072,46 @@ svn: File or directory 'moons.txt' is out of date; try updating
svn: resource out of date; try updating
- - The reason is that - Wolfman's changes were based on revision 8, - but the repository is now at revision 9, - and the file that Wolfman is trying to overwrite - is different in the later revision. - (Remember, - one of version control's main jobs is to make sure that - people don't trample on each other's work.) - Wolfman has to update his working copy to get Dracula's changes before he can commit. - Luckily, - Dracula edited a line that Wolfman didn't change, - so Subversion can merge the differences automatically. -
++ The reason is that + Wolfman's changes were based on revision 8, + but the repository is now at revision 9, + and the file that Wolfman is trying to overwrite + is different in the later revision. + (Remember, + one of version control's main jobs is to make sure that + people don't trample on each other's work.) + Wolfman has to update his working copy to get Dracula's changes before he can commit. + Luckily, + Dracula edited a line that Wolfman didn't change, + so Subversion can merge the differences automatically. +
-- This does not mean that Wolfman's changes have been committed to the repository: - Subversion only does that when it's ordered to. - Wolfman's changes are still in his working copy, - and only in his working copy. - But since Wolfman's version of the file now includes - the lines that Dracula added, - Wolfman can go ahead and commit them as usual to create revision 10. -
++ This does not mean that Wolfman's changes have been committed to the repository: + Subversion only does that when it's ordered to. + Wolfman's changes are still in his working copy, + and only in his working copy. + But since Wolfman's version of the file now includes + the lines that Dracula added, + Wolfman can go ahead and commit them as usual to create revision 10 + (Figure 10). +
-
- Wolfman's working copy is now in sync with the master,
- but Dracula's is one behind at revision 9.
- At this point,
- they independently decide to add measurement units
- to the columns in moons.txt
.
- Wolfman is quicker off the mark this time;
- he adds a line to the file:
-
+ Wolfman's working copy is now in sync with the master,
+ but Dracula's is one behind at revision 9.
+ At this point,
+ they independently decide to add measurement units
+ to the columns in moons.txt
.
+ Wolfman is quicker off the mark this time;
+ he adds a line to the file:
+
Name Orbital Radius Orbital Period Mass Radius @@ -971,12 +1125,12 @@ Himalia 11460 250.5662 0.095 85.0 Elara 11740 259.6528 0.008 40.0-
- and commits it to create revision 11. - While he is doing this, - though, - Dracula inserts a different line at the top of the file: -
++ and commits it to create revision 11. + While he is doing this, + though, + Dracula inserts a different line at the top of the file: +
Name Orbital Radius Orbital Period Mass Radius @@ -990,16 +1144,25 @@ Himalia 11460 250.5662 0.095 85.0 Elara 11740 259.6528 0.008 40.0-
- Once again, - when Dracula tries to commit, - Subversion tells him he can't. - But this time, - when Dracula does updates his working copy, - he doesn't just get the line Wolfman added to create revision 11. - There is an actual conflict in the file, - so Subversion asks Dracula what he wants to do: -
++ Once again, + when Dracula tries to commit, + Subversion tells him he can't. + But this time, + when Dracula does updates his working copy, + he doesn't just get the line Wolfman added to create revision 11 + (Figure 11). +
+ + + ++ There is an actual conflict in the file, + so Subversion asks Dracula what he wants to do: +
$ svn update
@@ -1009,12 +1172,12 @@ Select: (p) postpone, (df) diff-full, (e) edit,
(s) show all options:
-
- Dracula choose p
for "postpone",
- which tells Subversion that he'll deal with the problem later.
- Once the update is finished,
- he opens moons.txt
in his editor and sees:
-
+ Dracula choose p
for "postpone",
+ which tells Subversion that he'll deal with the problem later.
+ Once the update is finished,
+ he opens moons.txt
in his editor and sees:
+
Name Orbital Radius Orbital Period Mass @@ -1030,24 +1193,24 @@ Select: (p) postpone, (df) diff-full, (e) edit, Callisto 1882.7 16.689018 1075.9-
- As we can see,
- Subversion has inserted
- conflict markers
- in moons.txt
- wherever there is a conflict.
- The line <<<<<<< .mine
shows the start of the conflict,
- and is followed by the lines from the local copy of the file.
- The separator =======
is then
- followed by the lines from the repository's file that are in conflict with that section,
- while >>>>>>> .r11
marks the end of the conflict.
-
+ As we can see,
+ Subversion has inserted
+ conflict markers
+ in moons.txt
+ wherever there is a conflict.
+ The line <<<<<<< .mine
shows the start of the conflict,
+ and is followed by the lines from the local copy of the file.
+ The separator =======
is then
+ followed by the lines from the repository's file that are in conflict with that section,
+ while >>>>>>> .r11
marks the end of the conflict.
+
- Before he can commit, - Dracula has to edit his copy of the file to get rid of those markers. - He changes it to: -
++ Before he can commit, + Dracula has to edit his copy of the file to get rid of those markers. + He changes it to: +
Name Orbital Radius Orbital Period Mass Radius @@ -1061,132 +1224,174 @@ Himalia 11460 250.5662 0.095 85.0 Elara 11740 259.6528 0.008 40.0-
- then uses the svn resolved
command to tell Subversion that
- he has fixed the problem.
- Subversion will now let him commit to create revision 12.
-
+ then uses the svn resolved
command to tell Subversion that
+ he has fixed the problem.
+ Subversion will now let him commit to create revision 12.
+
- When Dracula did his update and Subversion detected the conflict in moons.txt
,
- it created three temporary files to help Dracula resolve it.
- The first is called moons.txt.r9
;
- it is the file as it was in Dracula's local copy
- before he started making changes,
- i.e., the common ancestor for his work
- and whatever he is in conflict with.
-
- The second file is moons.txt.r11
.
- This is the most up-to-date revision from the repository—the
- file as it is including Wolfman's changes.
- The third temporary file, moons.txt.mine
,
- is the file as it was in Dracula's working copy before he did the Subversion update.
-
- Subversion creates these auxiliary files primarily
- to help people merge conflicts in binary files.
- It wouldn't make sense to insert <<<<<<<
- and >>>>>>>
characters into an image file
- (it would almost certainly result in a corrupted image).
- The svn resolved
command deletes these three extra files
- as well as telling Subversion that the conflict has been taken care of.
-
- Some power users prefer to work with interpolated conflict markers directly, - but for the rest of us, - there are several tools for displaying differences and helping to merge them, - including Diffuse and WinMerge. - If Dracula launches Diffuse, - it displays his file, - the common base that he and Wolfman were working from, - and Wolfman's file in a three-pane view - (Figure XXX): -
+
+ When Dracula did his update and Subversion detected the conflict in moons.txt
,
+ it created three temporary files to help Dracula resolve it.
+ The first is called moons.txt.r9
;
+ it is the file as it was in Dracula's local copy
+ before he started making changes,
+ i.e., the common ancestor for his work
+ and whatever he is in conflict with.
+
+ The second file is moons.txt.r11
.
+ This is the most up-to-date revision from the repository—the
+ file as it is including Wolfman's changes.
+ The third temporary file, moons.txt.mine
,
+ is the file as it was in Dracula's working copy before he did the Subversion update.
+
- Dracula can use the buttons to merge changes from either of the edited versions
- into the common ancestor,
- or edit the central pane directly.
- Again,
- once he is done,
- he uses svn resolved
and svn commit
- to create revision 12 of the repository.
-
+ Subversion creates these auxiliary files primarily
+ to help people merge conflicts in binary files.
+ It wouldn't make sense to insert <<<<<<<
+ and >>>>>>>
characters into an image file
+ (it would almost certainly result in a corrupted image).
+ The svn resolved
command deletes these three extra files
+ as well as telling Subversion that the conflict has been taken care of.
+
- In this case, the conflict was small and easy to fix. - However, if two or more people on a team are repeatedly creating conflicts for one another, - it's usually a signal of deeper communication problems: - either they aren't talking as often as they should, or their responsibilities overlap. - If used properly, - the version control system can help the team find and fix these issues - so that it will be more productive in future. -
++ Some power users prefer to work with interpolated conflict markers directly, + but for the rest of us, + there are several tools for displaying differences and helping to merge them, + including Diffuse and WinMerge. + If Dracula launches Diffuse, + it displays his file, + the common base that he and Wolfman were working from, + and Wolfman's file in a three-pane view + (Figure 12): +
-- As mentioned earlier, - every logical change to a project should result in a single commit, - and every commit should represent one logical change. - This is especially true when resolving conflicts: - the work done to reconcile one person's changes with another are often complicated, - so it should be a single entry in the project's history, - with other, later, changes coming after it. -
+
+ Dracula can use the buttons to merge changes from either of the edited versions
+ into the common ancestor,
+ or edit the central pane directly.
+ Again,
+ once he is done,
+ he uses svn resolved
and svn commit
+ to create revision 12 of the repository.
+
+ In this case, the conflict was small and easy to fix. + However, if two or more people on a team are repeatedly creating conflicts for one another, + it's usually a signal of deeper communication problems: + either they aren't talking as often as they should, or their responsibilities overlap. + If used properly, + the version control system can help the team find and fix these issues + so that it will be more productive in future. +
-svn resolve files
tells Subversion that conflicts have been resolved.+ As mentioned earlier, + every logical change to a project should result in a single commit, + and every commit should represent one logical change. + This is especially true when resolving conflicts: + the work done to reconcile one person's changes with another are often complicated, + so it should be a single entry in the project's history, + with other, later, changes coming after it. +
-svn resolve files
tells Subversion that conflicts have been resolved.
- Now that we have seen how to merge files and resolve conflicts,
- we can look at how to use version control as an "infinite undo".
- Suppose that when Wolfman starts work late one night,
- his copy of monsters
is in sync with the head at revision 12.
- He decides to edit the file moons.txt
;
- unfortunately, he forgot that there was a full moon,
- so his changes don't make a lot of sense:
-
+ If you are working in a group, + partner with someone who has also wrote a biography for themselves + for the previous section's challenges. +
+ +svn update
+ to make sure their working copies are up to date
+ and that there are no local changes.
+ svn commit
.
+ + If you are working on your own, + you can simulate the steps above + by checking out a second copy of the project into a new directory. + (Remember, + this cannot overlap any existing checked-out copies.) + Edit your biography in one copy and commit those changes, + then switch to the other copy and edit the same file + before updating. +
+
+ Now that we have seen how to merge files and resolve conflicts,
+ we can look at how to use version control as an "infinite undo".
+ Suppose that when Wolfman starts work late one night,
+ his copy of explore
is in sync with the head at revision 12.
+ He decides to edit the file moons.txt
;
+ unfortunately, he forgot that there was a full moon,
+ so his changes don't make a lot of sense:
+
Just one moon can make me growl @@ -1194,35 +1399,35 @@ Four would make me want to howl ...-
- When he's back in human form the next day, - he wants to undo his changes. - Without version control, his choices would be grim: - he could try to edit them back into their original state by hand - (which for some reason hardly ever seems to work), - or ask his colleagues to send him their copies of the files - (which is almost as embarrassing as chasing the neighbor's cat when in wolf form). -
++ When he's back in human form the next day, + he wants to undo his changes. + Without version control, his choices would be grim: + he could try to edit them back into their original state by hand + (which for some reason hardly ever seems to work), + or ask his colleagues to send him their copies of the files + (which is almost as embarrassing as chasing the neighbor's cat when in wolf form). +
-
- Since he's using Subversion, though,
- and hasn't committed his work to the repository,
- all he has to do is revert his local changes.
- svn revert
simply throws away local changes to files
- and puts things back the way they were before those changes were made.
- This is a purely local operation:
- since Subversion stores the history of the project inside every working copy,
- Wolfman doesn't need to be connected to the network to do this.
-
+ Since he's using Subversion, though,
+ and hasn't committed his work to the repository,
+ all he has to do is revert his local changes.
+ svn revert
simply throws away local changes to files
+ and puts things back the way they were before those changes were made.
+ This is a purely local operation:
+ since Subversion stores the history of the project inside every working copy,
+ Wolfman doesn't need to be connected to the network to do this.
+
- To start,
- Wolfman uses svn diff
without the -r HEAD
flag
- to take a look at the differences between his file
- and the master copy in the repository.
- Since he doesn't want to keep his changes,
- his next command is svn revert moons.txt
.
-
+ To start,
+ Wolfman uses svn diff
without the -r HEAD
flag
+ to take a look at the differences between his file
+ and the master copy in the repository.
+ Since he doesn't want to keep his changes,
+ his next command is svn revert moons.txt
.
+
$ cd jupiter @@ -1230,13 +1435,13 @@ $ svn revert moons.txt Reverted moons.txt-
- What if someone has committed their changes,
- but still wants to undo them?
- For example,
- suppose Dracula decides that the numbers in moons.txt
would look better with commas.
- He edits the file to put them in:
-
+ What if someone has committed their changes,
+ but still wants to undo them?
+ For example,
+ suppose Dracula decides that the numbers in moons.txt
would look better with commas.
+ He edits the file to put them in:
+
Name Orbital Radius Orbital Period Mass Radius @@ -1250,47 +1455,49 @@ Himalia 11,460 250.5662 Elara 11,740 259.6528 0.008 40.0-
- then commits his changes to create revision 13. - A little while later, - the Mummy sees the change and orders Dracula to put things back the way they were. - What should Dracula do? -
++ then commits his changes to create revision 13. + A little while later, + the Mummy sees the change and orders Dracula to put things back the way they were. + What should Dracula do? +
-- We can draw the sequence of events leading up to revision 13 - as shown in Fixture XXX: -
++ We can draw the sequence of events leading up to revision 13 + as shown in Figure 13: +
- + -- Dracula wants to erase revision 13 from the repository, - but he can't actually do that: - once a change is in the repository, - it's there forever. - What he can do instead is merge the old revision with the current revision - to create a new revision - (Fixture XXX). -
++ Dracula wants to erase revision 13 from the repository, + but he can't actually do that: + once a change is in the repository, + it's there forever. + What he can do instead is merge the old revision with the current revision + to create a new revision + (Figure 14). +
- + -- This is exactly like merging changes made by two different people; - the only difference is that the "other person" is his past self. -
++ This is exactly like merging changes made by two different people; + the only difference is that the "other person" is his past self. +
-
- To undo his commas,
- Dracula must merge revision 12 (the one before his change)
- with revision 13 (the current head revision)
- using svn merge
:
-
+ To undo his commas,
+ Dracula must merge revision 12 (the one before his change)
+ with revision 13 (the current head revision)
+ using svn merge
:
+
$ svn merge -r HEAD:12 moons.txt @@ -1298,526 +1505,702 @@ $ svn merge -r HEAD:12 moons.txt U moons.txt-
- The -r
flag specifies the range of revisions to merge:
- to undo the changes from revision 12 to revision 13,
- he uses either 13:12
or HEAD:12
- (since he is going backward in time from the most recent revision to revision 12).
- This is called a reverse merge
- because he's going backward in time.
-
+ The -r
flag specifies the range of revisions to merge:
+ to undo the changes from revision 12 to revision 13,
+ he uses either 13:12
or HEAD:12
+ (since he is going backward in time from the most recent revision to revision 12).
+ This is called a reverse merge
+ because he's going backward in time.
+
- After he runs this command,
- he must run svn commit
to save the changes to the repository.
- This creates a new revision, number 14,
- rather than erasing revision 13.
- That way,
- the changes he made to create revision 13 are still there
- if he can ever convince the Mummy that numbers should have commas.
-
+ After he runs this command,
+ he must run svn commit
to save the changes to the repository.
+ This creates a new revision, number 14,
+ rather than erasing revision 13.
+ That way,
+ the changes he made to create revision 13 are still there
+ if he can ever convince the Mummy that numbers should have commas.
+
- Merging can be used to recover older revisions of files, - not just the most recent, - and to recover many files or directories at a time. - The most frequent use, though, - is to manage parallel streams of development in large projects. - This is outside the scope of this chapter, - but the basic idea is simple. -
+- Suppose that Universal Monsters has just released a new program for designing secret lairs. - Dracula and Wolfman are supposed to start adding a few features - that had to be left out of the first release because time ran short. - At the same time, - Frankenstein and the Mummy are doing technical support: - their job is to fix any bugs that users find. - All sorts of things could go wrong if both teams tried to work on the same code at the same time. - For example, - if Frankenstein fixed a bug and sent a new copy of the program to a user in Greenland, - it would be all too easy for him to accidentally include - the half-completed shark tank control feature that Wolfman was working on. -
+
+ Another way to recover a particular version of a particular file
+ is to use the svn copy
command.
+ If the URL of our repository is
+ https://universal.software-carpentry.org/explore
,
+ then the command:
+
- The usual way to handle this situation is - to create a branch - in the repository for each major sub-project - (Figure XXX). - While Wolfman and Dracula work on - the main line, - Frankenstein and the Mummy create a branch, - which is just another copy of the repository's files and directories - that is also under version control. - They can work in their branch without disturbing Wolfman and Dracula and vice versa: -
+
+$ svn copy https://universal.software-carpentry.org/explore/mission.txt@120 ./mission.txt
+
-
+
+ copies the file mission.txt
as it was in revision 120
+ into our working directory
+ (overwriting whatever mission.txt
file we currently have,
+ if any).
+ What's more,
+ using svn copy
brings along the file's history as well,
+ so that future svn log
operations will show
+ how mission.txt
was resurrected.
+
- Branches in version control repositories are often described as "parallel universes". - Each branch starts off as a clone of the project at some moment in time - (typically each time the software is released, - or whenever work starts on a major new feature). - Changes made to a branch only affect that branch, - just as changes made to the files in one directory don't affect files in other directories. - However, - the branch and the main line are both stored in the same repository, - so their revision numbers are always in step. -
++ Merging can be used to recover older revisions of files, + not just the most recent, + and to recover many files or directories at a time. + The most frequent use, though, + is to manage parallel streams of development in large projects. + This is outside the scope of this chapter, + but the basic idea is simple. +
-- If someone decides that a bug fix in one branch should also be made in another, - all they have to do is merge the files in question. - This is exactly like merging an old version of a file with the current one, - but instead of going backward in time, - the change is brought sideways from one branch to another. -
++ Suppose that Universal Missions has just released a new program + for designing interplanetary voyages. + Dracula and Wolfman are supposed to add some features + that were left out of the first release because time ran short. + At the same time, + Frankenstein and the Mummy are doing technical support: + their job is to fix any bugs that users find. +
-- Branching helps projects scale up by letting sub-teams work independently, - but too many branches can cause as many problems as they solve. - Karl Fogel's excellent book - Producing Open Source Software, - and Laura Wingerd and Christopher Seiwald's paper - "High-level Best Practices in Software Configuration Management", - talk about branches in much more detail. - Projects usually don't need to do this until they have a dozen or more developers, - or until several versions of their software are in simultaneous use, - but using branches is a key part of switching from software carpentry to software engineering. -
++ All sorts of things could go wrong + if both teams tried to work on the same code at the same time. + In particular, + Dracula and Wolfman might want to make large changes + to the structure of the code + in order to make it easier to add new features, + while Frankenstein and the Mummy want to make as few changes as possible + so as not to introduce new bugs while fixing old ones. +
-svn merge
merges two revisions of a file.svn revert
undoes local changes to files.+ The usual way to handle this situation is + to create a branch + in the repository for each major sub-project + (Figure 15). + While Wolfman and Dracula work on + the main line, + Frankenstein and the Mummy create a branch, + which is just another copy of the repository's files and directories + that is also under version control. + They can work in their branch without disturbing Wolfman and Dracula and vice versa: +
-+ Branches in version control repositories are often described as "parallel universes". + Each branch starts off as a clone of the project at some moment in time + (typically each time the software is released, + or whenever work starts on a major new feature). + Changes made to a branch only affect that branch, + just as changes made to the files in one directory don't affect files in other directories. + However, + the branch and the main line are both stored in the same repository, + so their revision numbers are always in step. +
-+ If someone decides that a bug fix in one branch should also be made in another, + all they have to do is merge the files in question. + This is exactly like merging an old version of a file with the current one, + but instead of going backward in time, + the change is brought sideways from one branch to another. +
-+ Branching helps projects scale up by letting sub-teams work independently, + but too many branches can cause as many problems as they solve. + Karl Fogel's excellent book + Producing Open Source Software, + and Laura Wingerd and Christopher Seiwald's paper + "High-level Best Practices in Software Configuration Management", + talk about branches in much more detail. + Projects usually don't need to do this until they have a dozen or more developers, + or until several versions of their software are in simultaneous use, + but using branches is a key part of switching from software carpentry to software engineering. +
-- It is finally time to see how to create a repository. - As a quick recap, - we will keep the master copy of our work in a repository - on a server that we can access from other machines on the internet. - That master copy consists of files and directories that no-one ever edits directly. - Instead, a copy of Subversion running on that machine - manages updates for us and watches for conflicts. - Our working copy is a mirror image of the master sitting on our computer. - When our Subversion client needs to communicate with the master, - it exchanges data with the copy of Subversion running on the server. -
+svn revert
undoes local changes to files.svn merge
merges two revisions of a file.- To make this to work, we need four things - (Figure XXX): -
++svn diff -r 240:261 fish.dat ++ does, and when you might want to run it. +
checkout
command.
- mission.txt
+ existed in revision 90 of a repository,
+ but had been deleted in revision 91.
+ What two commands could we use to recover it?
+ - To keep things simple, - we will start by creating a repository on the machine that we're working on. - This won't let us share our work with other people, - but it will allow us to save the history of our work as we go along. -
+
- The command to create a repository is svnadmin create
,
- followed by the path to the repository.
- If we want to create a repository called lair_repo
- directly under our home directory,
- we just cd
to get home
- and run svnadmin create lair_repo
.
- This command creates a directory called lair_repo
to hold our repository,
- and fills it with various files that Subversion uses
- to keep track of the project's history:
-
+ It is finally time to see how to create a repository. + As a quick recap, + we will keep the master copy of our work in a repository + on a server that we can access from other machines on the internet. + That master copy consists of files and directories that no-one ever edits directly. + Instead, a copy of Subversion running on that machine + manages updates for us and watches for conflicts. + Our working copy is a mirror image of the master sitting on our computer. + When our Subversion client needs to communicate with the master, + it exchanges data with the copy of Subversion running on the server. +
+ ++ To make this to work, we need four things: +
+ +checkout
command.
+ + To keep things simple, + we will start by creating a repository on the machine that we're working on. + This won't let us share our work with other people, + but it will allow us to save the history of our work as we go along. +
+ +
+ The command to create a repository is svnadmin create
,
+ followed by the path to the repository.
+ If we want to create a repository called missions_repo
+ directly under our home directory,
+ we just cd
to get home
+ and run svnadmin create missions_repo
.
+ This command creates a directory called missions_repo
to hold our repository,
+ and fills it with various files that Subversion uses
+ to keep track of the project's history:
+
$ cd -$ svnadmin create lair_repo -$ ls -F lair_repo +$ svnadmin create missions_repo +$ ls -F missions_repo README.txt conf/ db/ format hooks/ locks/-
- We should never edit anything in this repository directly. - Doing so probably won't shred our sanity and leave us gibbering in mindless horror, - but it will almost certainly make the repository unusable. -
+
+ We should never edit any of this directly,
+ since it will almost certainly make the repository unusable.
+ Instead,
+ we should use svn checkout
+ to get a working copy of this repository.
+ If our home directory is /users/mummy
,
+ then the full path to the repository we just created is /users/mummy/missions_repo
,
+ so we run svn checkout file:///users/mummy/missions missions_working
.
+
- To get a working copy of this repository,
- we use Subversion's checkout
command.
- If our home directory is /users/mummy
,
- then the full path to the repository we just created is /users/mummy/lair_repo
,
- so we run svn checkout file:///users/mummy/lair lair_working
.
-
+ Working backward,
+ the second argument,
+ missions_working
,
+ specifies where the working copy is to be put.
+ The first argument is the URL of our repository,
+ and it has two parts.
+ /users/mummy/missions_repo
is the path to repository directory.
+ file://
specifies the protocol
+ that Subversion will use to communicate with the repository—in this case,
+ it says that the repository is part of the local machine's filesystem.
+ (Notice that the protocol ends in two slashes,
+ while the absolute path to the repository starts with a slash,
+ making three in total.
+ A very common mistake is to type only two, since that's what web URLs normally have.)
+
- Working backward,
- the second argument,
- lair_working
,
- specifies where the working copy is to be put.
- The first argument is the URL of our repository,
- and it has two parts.
- /users/mummy/lair_repo
is the path to repository directory.
- file://
specifies the protocol
- that Subversion will use to communicate with the repository—in this case,
- it says that the repository is part of the local machine's filesystem.
- Notice that the protocol ends in two slashes,
- while the absolute path to the repository starts with a slash,
- making three in total.
- A very common mistake is to type only two, since that's what web URLs normally have.
-
+ When we're doing a checkout,
+ it is very important that we provide the second argument,
+ which specifies the name of the directory we want the working copy to be put in.
+ Without it,
+ Subversion will try to use the name of the repository,
+ missions_repo
,
+ as the name of the working copy.
+ Since we're in the directory that contains the repository,
+ this means that Subversion will try to overwrite the repository with a working copy.
+ Again,
+ there isn't much risk of our sanity being torn to shreds,
+ but this could ruin our repository.
+
- When we're doing a checkout,
- it is very important that we provide the second argument,
- which specifies the name of the directory we want the working copy to be put in.
- Without it,
- Subversion will try to use the name of the repository,
- lair_repo
,
- as the name of the working copy.
- Since we're in the directory that contains the repository,
- this means that Subversion will try to overwrite the repository with a working copy.
- Again,
- there isn't much risk of our sanity being torn to shreds,
- but this could ruin our repository.
-
+ To avoid this problem,
+ most people create a sub-directory in their account called something like repos
,
+ and then create their repositories in that.
+ For example,
+ we could create our repository in /users/mummy/repos/missions
,
+ then check out a working copy as /users/mummy/missions
.
+ This practice makes both names easier to read.
+
- To avoid this problem,
- most people create a sub-directory in their account called something like repos
,
- and then create their repositories in that.
- For example,
- we could create our repository in /users/mummy/repos/lair
,
- then check out a working copy as /users/mummy/lair
.
- This practice makes both names easier to read.
-
+ The obvious next step is to put our repository on a server, + rather than on our personal machine. + In fact, + we should always do this + so that we don't lose the history of our project + if our laptop is damaged or stolen. + A departmental server is also much more likely to be backed up regularly + than our personal machine… +
-- The obvious next steps are - to put our repository on a server, - rather than on our personal machine, - and to give other people access to the repository we have just created - so that they can work with us. - We'll discuss the first in a later chapter, - but unfortunately, - the second really does require things that we are not going to cover in this course. - If you want to do this, you can: -
+
+ Creating a repository on a server is simple:
+ just log in and go through the steps described above.
+ Accessing that repository from another machine
+ is also straightforward.
+ If the machine's address is serv.euphoric.edu
,
+ and our user ID is dracula
,
+ the URL of the repository will be something like:
+
+svn+ssh://dracula@serv.euphoric.edu/home/dracula/repos/missions +-
+ Reading from left to right: +
-svn+ssh
is the protocol that Subversion uses to connect to the server
+ (in this case,
+ a combination of Subversion's own protocol
+ and SSH);
+ dracula@serv.euphoric.edu
identifies the server and who we are
+ (just like an email address);
+ and
+ /home/dracula/repos/missions
is the absolutely path of the repository
+ on the server.
+ + That's fine if you are the only person using the repository, + but if you want to share it with others, + you need to worry about security. + As we discuss in the lesson on web programming, + as soon as you provide a service on the internet, + there's the possibility that someone may try to attack your system through it. + Rather than trying to learn enough system administration skills + to set things up safely, + it is usually easier to: +
-- If you choose the second or third option, - please check with whoever handles intellectual property at your institution - to make sure that putting your work on a commercially-operated machine - that is probably in some other legal jurisdiction - isn't going to cause trouble. - Many people assume that it's "just OK", - while others act as if not having asked will be an acceptable defence later on. - Unfortunately, - neither is true… -
+svnadmin create name
creates a new repository.+ If you choose the second or third option, + please check with whoever handles intellectual property at your institution + to make sure that putting your work on a commercially-operated machine + that is probably in some other legal jurisdiction + isn't going to cause trouble. + Many people assume that it's "just OK", + while others act as if not having asked will be an acceptable defence later on. + Unfortunately, + neither is true… +
+ +svnadmin create name
creates a new repository.trials_repo
+ in your home directory.
+ Check out a working copy in a directory called trials_working
+ (also in your home directory).
+ Add a couple of text files,
+ commit the changes,
+ and then use svn info trials_working
+ to see what Subversion tells you about your working copy.
+ - In art, - the provenance of a work - is the history of who owned it, when, and where. - In science, - it's the record of how a particular result came to be: - what raw data was processed by what version of what program to create which intermediate files, - what was used to turn those files into which figures of which papers, - and so on. -
+
- One of the central ideas of this course is that
- wen can automatically track the provenance of scientific data.
- To start,
- suppose we have a text file combustion.dat
in a Subversion repository.
- Run the following two commands:
-
+ In art, + the provenance of a work + is the history of who owned it, when, and where. + In science, + it's the record of how a particular result came to be: + what raw data was processed by what version of what program to create which intermediate files, + what was used to turn those files into which figures of which papers, + and so on. +
+ +
+ One of the big benefits of using version control is that
+ it lets us track the provenance of scientific data automatically.
+ To start,
+ suppose we have a text file combustion.dat
in a Subversion repository.
+ Run the following two commands:
+
$ svn propset svn:keywords Revision combustion.dat $ svn commit -m "Turning on the 'Revision' keyword" combustion.dat-
- Now open the file in an editor - and add the following line somewhere near the top: -
++ This does nothing by itself, + but now open the file in an editor + and add the following line somewhere near the top: +
-# $Revision:$ +$Revision:$-
- The '#' sign isn't important:
- it's just what .dat
files use to show comments.
- The $Revision:$
string,
- on the other hand,
- means something special to Subversion.
- Save the file, and commit the change:
-
+ The $Revision:$
string means something special to Subversion.
+ Save the file, and commit the change:
+
$ svn commit -m "Inserting the 'Revision' keyword" combustion.dat-
- When we open the file again, - we'll see that Subversion has changed that line to something like: -
++ When we open the file again, + we'll see that Subversion has changed that line to something like: +
-# $Revision: 143$ +$Revision: 143$-
- i.e., Subversion has inserted the version number
- after the colon and before the closing $
.
-
+ i.e., it has inserted the version number
+ after the colon and before the closing $
.
+ If we edit the file again—e.g., add a couple of lines with random numbers—and
+ commit once more,
+ the line is updated again to:
+
- Here's what just happened.
- First, Subversion allows you to set
- properties
- for files and and directories.
- These properties aren't in the files or directories themselves,
- but live in Subversion's database.
- One of those properties,
- svn:keywords
,
- tells Subversion to look in files that are being changed
- for strings of the form $propertyname: …$
,
- where propertyname
is a string like Revision
or Author
.
- (About half a dozen such strings are supported.)
-
+$Revision: 144$ +-
- If it sees such a string,
- Subversion rewrites it as the commit is taking place to replace …
- with the current version number,
- the name of the person making the change,
- or whatever else the property's name tells it to do.
- You only have to add the string to the file once;
- after that,
- Subversion updates it for you every time the file changes.
-
+ Here's what just happened.
+ First, Subversion allows uss to add
+ properties
+ to files and and directories.
+ These properties aren't stored in the files or directories themselves,
+ but in Subversion's database.
+ One of those properties,
+ svn:keywords
,
+ tells Subversion to look in files that are being changed
+ for strings of the form $propertyname: …$
,
+ where propertyname
is a string like Revision
or Author
.
+ (About half a dozen such strings are supported.)
+
- Putting the version number in the file this way can be pretty handy. - If you copy the file to another machine, - for example, - it carries its version number with it, - so you can tell which version you have even if it's outside version control. - We'll see some more useful things we can do with this information in - the next chapter. -
+
+ If it sees such a string,
+ Subversion rewrites it as the commit is taking place to replace …
+ with the current version number,
+ the name of the person making the change,
+ or whatever else the property's name tells it to do.
+ We only have to add the string to the file once;
+ after that,
+ Subversion updates it for you every time the file changes.
+
- Despite the rapidly decreasing cost of storage, - it is still possible to run out of disk space. - In some labs, - people can easy go through 2 TB/month if they're not careful. - Since version control tools usually store revisions in terms of lines, - with binary data files, - they end up essentially storing every revision separately. - This isn't that bad - (it's what we'd be doing anyway), - but it means version control isn't doing what it likes to do, - and the repository can get very large very quickly. - Another concern is that if very old data will no longer be used, - it can be nice to archive or delete old data files. - This is not possible if our data is version controlled: - information can only be added to a repository, - so it can only ever increase in size. -
- -+ Putting the version number in the file this way can be pretty handy. + If you copy the file to another machine, + for example, + it carries its version number with it, + so you can tell which version you have even if it's outside version control. + We'll see some more useful things we can do with this information later. +
-
- We can use this trick with shell scripts too,
- or with almost any other kind of program.
- Going back to Nelle Nemo's data processing from the previous chapter,
- for example,
- suppose she writes a shell script that uses gooclean
- to tidy up data files.
- Her first version looks like this:
-
+ We can use this trick with shell scripts too,
+ or with almost any other kind of program.
+ Let's go back to Nelle Nemo's data processing from
+ the lesson on the shell.
+ Suppose she writes a shell script called gooclean
+ to tidy up data files.
+ Her first version looks like this:
+
-for filename in $* -do - gooclean -b 0 100 < $filename > cleaned-$filename -done +# gooclean: clean up a single data file +goonorm -b 0 100 < $1 | goofilter -x --enlarge 2.0 > cleaned-$1-
- i.e., it runs gooclean
with bounding values of 0 and 100
- for each specified file,
- putting the result in a temporary file with a well-defined name.
- Assuming that '#' is the comment character for those kinds of data files,
- she could instead write:
-
+ i.e.,
+ it runs goonorm
and then goofilter
with some fixed parameters
+ and creates an output file called cleaned-something.dat
+ (if the input file's name was something.dat
).
+ Assuming that '#' is the comment character for her output files,
+ she could instead write:
+
-for filename in $* -do - echo "gooclean $Revision: 901$ -b 0 100" > $filename - gooclean -b 0 100 < $filename >> cleaned-$filename -done +# gooclean: clean up a single data file +echo "# gooclean $Revision:$" > cleaned-$1 +goonorm -b 0 100 < $1 | goofilter -x --enlarge 2.0 >> cleaned-$1-
- The first change puts a line in the output file
- that describes how that file was created.
- The second change is to use >>
instead of >
- to redirect gooclean
's output to the file.
- >>
means "append to":
- instead of overwriting whatever is in the file,
- it adds more content to it.
- This ensures that the first line of the file is the provenance record,
- with the actual output of gooclean
after it.
-
+ then set the svn:keywords
property
+ and commit the file to insert the revision number,
+ making it:
+
$Keyword:$
in a file can be filled in with a property value each time the file is committed.svn propset svn:keywords property files
tells Subversion to start filling in property values.+# gooclean: clean up a single data file +echo "# gooclean $Revision: 487$" > cleaned-$1 +goonorm -b 0 100 < $1 | goofilter -x --enlarge 2.0 >> cleaned-$1 ++ +
+ Now, + each time this script is run it will: +
+ ++# gooclean $Revision: 487$ ++ in the output file, + then +
goonorm
and goofilter
+ would have put in the file originally.
+ (The double redirection >>
means "append to" rather than "overwrite".)
+
+ In other words,
+ the output of this shell script will always record
+ exactly what version of the script produced it.
+ This isn't enough to reproduce the output—we would need to record
+ the version numbers of the input files and the goonorm
and goofilter
programs,
+ and the values of the parameters those programs used
+ in order to do that—but it's an important and useful first step.
+
$Keyword: …$
in a file can be filled in with a property value each time the file is committed.svn propset svn:keywords property files
tells Subversion to start filling in property values.$Id:$
to a file,
+ use svn propset
to set the corresponding property,
+ and then commit a change to the file.
+ What value does Subversion fill in for this keyword?
+ When would you use this rather than Revision
or Author
?
+ svn:ignore
property do when applied to a directory?
+ When would you use it?
+ - Correlation does not imply causality, - but there is a very strong correlation between - using version control - and doing good computational science. - There's an equally strong correlation - between not using it and either wasting effort or getting things wrong. - Today (the middle of 2013), - I will not review a paper if the software used in it - is not under version control. - The work it reports might be interesting, - but without the kind of record-keeping that version control provides, - there's no way to know exactly what its authors did. - Just as importantly, - if someone doesn't know enough about computing to use version control, - the odds are good that they don't know enough - to do the programming right either. + In 2006, + McCullough, McGeary, and Harrison + analyzed several years of + the data and code archive of Journal of Money, Credit, and Banking, + a prestigious journal with a mandatory archiving policy. + Of 266 articles published during that time, + 193 were empirical and should have had data and code deposited in the archive. + Of those, + only 69 actually had anything in the archive; + Excluding eleven articles that only had data, + and seven that required software or other resources they did not have, + McCullough et al. were only able to replicate 14 of the remaining 186 articles. + This doesn't mean that the other 92% were wrong, + but it does mean there is no practical way to tell. +
+ ++ By itself, + version control doesn't making computational research reproducible. + It does help, + though, + and also eliminates the frustration and wasted time caused by + trying to figure out which emailed copy of a file, + or which of a dozen directories or USB drives, + is the most recent. + And while correlation doesn't imply causality, + there is certainly a strong correlation between + knowing enough about good computational practices to use version control + and knowing how to do other things right as well.