From: W. Trevor King Date: Mon, 10 Jun 2013 15:44:15 +0000 (-0400) Subject: version-control: Add README.md and instructor.md from the guide X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=b702e7ace1cd14a6ed6441390729d59dc8f7992f;p=swc-workshop.git version-control: Add README.md and instructor.md from the guide Move instructor hints (For Instructors section) and subject outlines ("understand" and "keypoints" classes) from the instructor guide [1] into the boot camp repository. This is currently targeted at Subversion, but making these notes tool-agnostic will come in the next commit. Here I just copy the guide content over while translating it to Markdown. For posterity, I've grafted on the guide history. Here's how I extracted the svn.html history from the guide repository: 1. Start a new branch in the earlier guide repository: $ git checkout -b wip d013cab 2. Limit history to svn.html: $ git filter-branch -f --prune-empty \ > --index-filter 'git rm --cached --ignore-unmatch $(git ls-files | grep -v svn.html)' \ > HEAD 3. Drop no-op merges: $ git rebase -i 0025ac4 Then I cherry-picked my original boot-camps commit (c1330e0) onto the result to create this commit: $ git cherry-pick c1330e0 $ git rm svn.html $ git commit --amend [1]: https://github.com/swcarpentry/guide --- diff --git a/svn.html b/svn.html deleted file mode 100644 index 38bd7c0..0000000 --- a/svn.html +++ /dev/null @@ -1,2225 +0,0 @@ -{% extends "templates/_base.html" %} - -{% block file_metadata %} - - -{% endblock file_metadata %} - -{% block content %} -
    -
  1. Basic Use
  2. -
  3. Merging Conflicts
  4. -
  5. Recovering Old Versions
  6. -
  7. Setting up a Repository
  8. -
  9. Provenance
  10. -
  11. Summing Up
  12. -
- -

- Wolfman and Dracula have been hired by Universal Missions - (a space services spinoff from Euphoric State University) - to figure out where the company should send its next planetary lander. - They want to be able to work on the plans at the same time, - but they have run into problems doing this in the past. - If they take turns, - each one will spend a lot of time waiting for the other to finish. - On the other hand, - if they work on their own copies and email changes back and forth - they know that things will be lost, overwritten, or duplicated. -

- -

- The right solution is to use a - version control system - to manage their work. - Version control is better than mailing files back and forth because: -

- -
    - -
  1. - It's hard (but not impossible) to accidentally overlook or overwrite someone's changes, - because the version control system highlights them automatically. -
  2. - -
  3. - It keeps a record of who made what changes when, - so that if people have questions later on, - they know who to ask - (or blame). -
  4. - -
  5. - Nothing that is committed to version control is ever lost. - This means it can be used like the "undo" feature in an editor, - and since all old versions of files are saved - it's always possible to go back in time to see exactly who wrote what on a particular day, - or what version of a program was used to generate a particular set of results. -
  6. - -
- -

- The rest of this chapter will explore how to use - a popular open source version control system called Subversion. - It does not have all the features of some newer systems, - such as Git, - but it is still widely used, - and is simpler to pick up than those more advanced alternatives. - No matter which system you use, - the most important thing to learn is not the details of their more obscure commands, - but the workflow that they encourage. -

- -
-

For Instructors

- -

- Version control is the most important practical skill we introduce. - As the last paragraph of the introduction above says, - the workflow matters more than the ins and outs of any particular tool. - By the end of 90 minutes, - the instructor should be able to get learners to chant, - "Update, edit, merge, commit," in unison, - and have them understand what those terms mean - and why that's a good way to structure their working day. -

- -

- Provided there aren't network problems, - this entire lesson can be covered in 90 minutes. - The example at the end - showing how to use Subversion keywords to track provenance - is the "ah ha!" moment for many learners. - If time is short, - skip the material on recovering old versions of files - in order to get to this section instead. - (The fact that provenance is harder in Git, - both mechanically and conceptually, - is one reason to keep teaching Subversion.) -

- -
-

Prerequisites

-

- Basic shell concepts and skills - (ls, cd, mkdir, - editing files); - basic shell scripting - (for the discussion of provenance). -

-
- -
-

Teaching Notes

-
    -
  • - Make sure the network is working before starting this lesson. -
  • -
  • - Give learners a ten-minute overview of what version control does for them - before diving into the watch-and-do practicals. - Most of them will have tried to co-author papers by emailing files back and forth, - or will have biked into the office - only to realize that the USB key with last night's work - is still on the kitchen table. - Instructors can also make jokes about directories with names like - "final version", - "final version revised", - "final version with reviewer three's corrections", - "really final version", - and, - "come on this really has to be the last version" - to motivate version control as a better way to collaborate - and as a better way to back work up. -
  • -
  • - Version control is typically taught after the shell, - so collect learners' names during that session - and create a repository for them to share - with their names as both their IDs and their passwords. - The easiest way to create the repository is to use - a server managed by an ISP such as Dreamhost, - or on SourceForge, Google Code, or some other "forge" site, - all of which provide web interfaces for repository creation and management. - If your learners are advanced enough to be using SSH, - you can instead create it on any server they can access, - and connect with the svn+ssh protocol instead of HTTPS. -
  • -
  • - Be very clear what files learners are to edit - and what user IDs they are to use - when giving instructions. - It is common for them to edit the instructor's biography, - or to use the instructor's user ID and password when committing. - Be equally clear when they are to edit things: - it's also common for someone to edit the file the instructor is editing - and commit changes while the instructor is explaining what's going on, - so that a conflict occurs when the instructor comes to commit the file. -
  • -
  • - Learners could do most exercises with repositories on their own machines, - but it's hard for them to see how version control helps collaboration - unless they're sharing a repository with other learners. - In particular, - showing learners who changed what using svn blame - is only compelling if a file has been edited by at least two people. -
  • -
  • - If some learners are using Windows, - there will inevitably be issues merging files with different line endings. - svn diff -x -w is supposed to suppress differences in whitespace, - but we have found that it doesn't always work as advertised. -
  • -
-
- -
- -
-

Basic Use

- -
-

Learning Objectives

-
    -
  • Draw a diagram showing the places version control stores information.
  • -
  • Check out a working copy of a repository.
  • -
  • View the history of changes to a project.
  • -
  • Explain why working copies of different projects should not overlap.
  • -
  • Add files to a project.
  • -
  • Commit changes made to a working copy to a repository.
  • -
  • Update a working copy to get changes from the repository.
  • -
  • Compare the current state of a working copy to the last update from the repository, and to the current state of the repository.
  • -
  • Explain what "version 123 of xyz.txt" actually means.
  • -
-

- 20 minutes. -

-
- -

- A version control system keeps the master copy of a file - in a repository - located on a server—a computer - that is never used directly by people, - but only by their programs - (Figure 1). - No-one ever edits the master copy directly. - Instead, - Wolfman and Dracula each have a working copy - on their own machines. - They can each edit their working copies whenever and however they want. -

- -
- Repositories and Working Copies -
Figure 1: Repositories and Working Copies
-
- -

- When Wolfman is ready to share his changes with Dracula, - he commits his work to the repository - (Figure 2). - Dracula can then update his working copy - to get those changes when he's ready for them. - And of course, - when Dracula finishes working on something, - he can commit and so that Wolfman can update. -

- -
- Sharing Files Through Version Control -
Figure 2: Sharing Files Through Version Control
-
- -

- If this is all there was to version control, - it would be no better than FTP or Dropbox. - But what if Dracula and Wolfman change their working copies at the same time? - If Wolfman commits first, - his changes are simply copied to the repository - (Figure 3): -

- -
- Wolfman Commits First -
Figure 3: Wolfman Commits First
-
- -

- If Dracula now tries to commit something that would overwrite Wolfman's changes - the version control system detects the conflict, - halts the commit, - and tells Dracula that there's a problem - (Figure 4): -

- -
- Dracula Has a Conflict -
Figure 4: Dracula Has a Conflict
-
- -

- Dracula must resolve that conflict - before the version control system will allow him to commit his work. - He can accept what Wolfman did, - replace it with what he has done, - or write something new that combines the two—that's up to him - (Figure 5). - Once he has cleaned things up, he can go ahead and try committing again. - If all of the conflicts have been resolved, - the version control will accept it this time. -

- -
- Resolving the Conflict -
Figure 5: Resolving the Conflict
-
- -
-

Forgiveness vs. Permission

- -

- Old-fashioned version control systems prevented conflicts from happening - by locking the master copy - whenever someone was working on it. - This pessimistic strategy - guaranteed that a second person (or monster) - could never make changes to the same file at the same time, - but it also meant that people had to take turns editing files. -

- -

- Most of today's version control systems use - an optimistic strategy instead: - people are always allowed to edit their working copies, - and if a conflict occurs, - the version control system helps them sort it out after the fact. -

-
- -

- To see how this actually works, - let's assume that the Mummy - (Dracula and Wolfman's boss) - has already put some notes in a version control repository - whose URL is https://universal.software-carpentry.org/explore. - Every repository has an address like this that uniquely identifies the location of the master copy. -

- -
-

There's More Than One Way To Do It

- -

- We will drive Subversion from the command line in our examples, - but if you prefer using a GUI, - there are many for you to choose from. - Please see the reference for links. -

-
- -

- It's Monday morning, - and Dracula has just joined the project. - In order to get a working copy on his computer, - Dracula has to check out a copy of the repository. - He only has to do this once per project: - once he has a working copy, - he can update it over and over again to get other people's work. -

- -

- While in his home directory, - Dracula types the command: -

- -
-$ svn checkout https://universal.software-carpentry.org/explore
-
- -

- This creates a new directory called explore - and fills it with a copy of the repository's contents - (Figure 6). -

- -
-A    explore/jupiter
-A    explore/mars
-A    explore/mars/mons-olympus.txt
-A    explore/mars/cydonia.txt
-A    explore/earth
-A    explore/earth/himalayas.txt
-A    explore/earth/antarctica.txt
-A    explore/earth/carlsbad.txt
-Checked out revision 6.
-
- -
- Example Repository -
Figure 6: Example Repository
-
- -

- Dracula can then go into this directory - and use regular shell commands to view the files: -

- -
-$ cd explore
-$ ls
-earth   jupiter mars
-$ ls *
-earth:
-antarctica.txt  carlsbad.txt  himalayas.txt
-
-jupiter:
-
-mars:
-cydonia.txt  mons-olympus.txt
-
- -
-

Don't Let the Working Copies Overlap

- -

- It's very important that the working copies of different project do not overlap; - in particular, - we should never try to check out one project inside a working copy of another project. - The reason is that Subversion stories information about - the current state of a working copy - in special sub-directories called .svn: -

- -
-$ pwd
-/home/dracula/explore
-$ ls -a
-.    ..    .svn    earth    jupiter    mars
-$ ls -F .svn
-entries    prop-base/    props/    text-base/    tmp/
-
- -

- If two working copies overlap, - the files in the .svn directories for one repository - will be clobbered by the other repository's .svn files, - and Subversion will become hopelessly confused. -

-
- -

- Dracula can find out more about the history of the project - using Subversion's log command: -

- -
-$ svn log
-------------------------------------------------------------------------
-r6 | mummy | 2010-07-26 09:21:10 -0400 (Mon, 26 Jul 2010) | 1 line
-
-Damn the budget---the Jovian moons would be a _perfect_ place to explore.
-------------------------------------------------------------------------
-r5 | mummy | 2010-07-26 09:19:39 -0400 (Mon, 26 Jul 2010) | 1 line
-
-The budget might not even stretch to the Arctic :-(
-------------------------------------------------------------------------
-r4 | mummy | 2010-07-26 09:17:46 -0400 (Mon, 26 Jul 2010) | 1 line
-
-Budget cuts may force us to do another dry run in the Arctic.
-------------------------------------------------------------------------
-r3 | mummy | 2010-07-26 09:14:14 -0400 (Mon, 26 Jul 2010) | 1 line
-
-Converting document to wiki-formatted text.
-------------------------------------------------------------------------
-r2 | mummy | 2010-07-26 09:11:55 -0400 (Mon, 26 Jul 2010) | 1 line
-
-Or put it down near the Face of Cydonia?
-------------------------------------------------------------------------
-r1 | mummy | 2010-07-26 09:08:23 -0400 (Mon, 26 Jul 2010) | 1 line
-
-Send the probe to Mons Olympus?
-------------------------------------------------------------------------
-
- -

- Subversion displays a summary of all the changes made to the project so far. - This list includes the - revision number, - the name of the person who made the change, - the date the change was made, - and whatever comment the user provided when the change was submitted. - As we can see, - the explore project is currently at revision 6, - and all changes so far have been made by the Mummy. -

- -

- Notice how detailed the comments on the updates are. - Good comments are as important in version control as they are in coding. - Without them, it can be very difficult to figure out who did what, when, and why. - We can use comments like "Changed things" and "Fixed it" if we want, - or even no comments at all, - but we'll only be making more work for our future selves. -

- -
-

Numbering Versions

- -

- Another thing to notice is that the revision number applies to the whole repository, - not to a particular file. - When we talk about "version 61" we mean - "the state of all files and directories at that point." - Older version control systems like CVS gave each file a new version number when it was updated, - which meant that version 38 of one file could correspond in time to version 17 of another - (Figure 7). - Experience shows that - global version numbers that apply to everything in the repository - are easier to manage than - per-file version numbers, - so that's what Subversion uses. -

- -
- Version Numbering Schemes -
Figure 7: Version Numbering Schemes
-
-
- -

- A couple of cubicles away, - Wolfman also runs svn checkout - to get a working copy of the repository. - He also gets version 6, - so the files on his machine are the same as the files on Dracula's. - While he is looking through the files, - Dracula decides to add some information to the repository about Jupiter's moons. - Using his favorite editor, - he creates a file in the jupiter directory called moons.txt, - and fills it with information about Io, Europa, Ganymede, and Callisto: -

- -
-Name            Orbital Radius  Orbital Period  Mass            Radius
-Io              421.6           1.769138        893.2           1821.6
-Europa          670.9           3.551181        480.0           1560.8
-Ganymede        1070.4          7.154553        1481.9          2631.2
-Calisto         1882.7          16.689018       1075.9          2410.3
-
- -

- After double-checking his data, - he wants to commit the file to the repository so that everyone else on the project can see it. - The first step is to add the file to his working copy using svn add: -

- -
-$ svn add jupiter/moons.txt
-A         jupiter/moons.txt
-
- -

- Adding a file is not the same as creating it—he has already done that. - Instead, - the svn add command tells Subversion to add the file to - the list of things it's supposed to manage. - It's quite common, - particularly in programming projects, - to have backup files or intermediate files in a directory - that aren't worth storing in the repository. - This is why version control requires us to explicitly tell it which files are to be managed. -

- -

- Once he has told Subversion to add the file, - Dracula can go ahead and commit his changes to the repository. - He uses the -m flag to provide a one-line message explaining what he's doing; - if he didn't, - Subversion would open his default editor - so that he could type in something longer. -

- -
-$ svn commit -m "Some basic facts about the Galilean moons of Jupiter." jupiter/moons.txt
-Adding         jupiter/moons.txt
-Transmitting file data .
-Committed revision 7.
-
- -

- When Dracula runs the svn commit command, - Subversion establishes a connection to the server, - copies over his changes, - and updates the revision number from 6 to 7 - (Figure 8). -

- -
- Updated Repository -
Figure 8: Updated Repository
-
- -
-

When Not to Use Version Control

- -

- Despite the rapidly decreasing cost of storage, - it is still possible to run out of disk space. - In some labs, - people can easy go through 2 TB/month if they're not careful. - Since version control tools usually store revisions in terms of lines, - with binary data files, - they end up essentially storing every revision separately. - This isn't that bad - (it's what we'd be doing anyway), - but it means version control isn't doing what it likes to do, - and the repository can get very large very quickly. - Another concern is that if very old data will no longer be used, - it can be nice to archive or delete old data files. - This is not possible if our data is version controlled: - information can only be added to a repository, - so it can only ever increase in size. -

- -
- -

- Back in his cubicle, - Wolfman uses svn update to update his working copy. - It tells him that a new file has been added - and brings his working copy up to date with version 7 of the repository, - because this is now the most recent revision - (also called the head). - svn update updates an existing working copy, - rather than checking out a new one. - While svn checkout is usually only run once per project per machine, - svn update may be run many times a day. -

- -

- Looking in the new file jupiter/moons.txt, - Wolfman notices that Dracula has misspelled "Callisto" - (it is supposed to have two L's.) - Wolfman edits that line of the file: -

- -
-Name            Orbital Radius  Orbital Period  Mass            Radius
-Io              421.6           1.769138        893.2           1821.6
-Europa          670.9           3.551181        480.0           1560.8
-Ganymede        1070.4          7.154553        1481.9          2631.2
-Callisto        1882.7          16.689018       1075.9          2410.3
-
- -

- He also adds a line about Amalthea, - which he thinks might be an interesting place to send a probe - despite its small size: -

- -
-Name            Orbital Radius  Orbital Period  Mass            Radius
-Amalthea        181.4           0.498179        0.075           125.0
-Io              421.6           1.769138        893.2           1821.6
-Europa          670.9           3.551181        480.0           1560.8
-Ganymede        1070.4          7.154553        1481.9          2631.2
-Callisto        1882.7          16.689018       1075.9          2410.3
-
- -

- Next, - he uses the svn status command to check that he hasn't accidentally changed anything else: -

- -
-$ svn status
-M       jupiter/moons.txt
-
- -

- and then runs svn commit. - Since has hasn't used the -m flag to provide a message on the command line, - Subversion launches his default editor and shows him: -

- -
-
---This line, and those below, will be ignored--
-
-M    jupiter/moons.txt
-
- -

- He changes this to be -

- -
-1. Fixed typo in moon's name: 'Calisto' -> 'Callisto'.
-2. Added information about Amalthea.
---This line, and those below, will be ignored--
-
-M    jupiter/moons.txt
-
- -

- When he saves this temporary file and exits the editor, - Subversion commits his changes: -

- -
-Sending        jupiter/moons.txt
-Transmitting file data .
-Committed revision 8.
-
- -

- Note that since Wolfman didn't specify a particular file to commit, - Subversion commits all of his changes. - This is why he ran the svn status command first. -

- -
-

Which Editor?

-

- If you don't have a default editor set up, - Subversion will probably open an editor called Vi. - If this happens, - type escape-colon-w-q-! to exit - and hope it never happens again. -

-
- -
-

Working With Multiple Files

- -

- Our example only includes one file, - but version control can work on any number of files at once. - For example, - if Wolfman noticed that a dozen data files had the same incorrect header, - he could change it in all 12 files, - then commit all those changes at once. - This is actually the best way to work: - every logical change to the project should be a single commit, - and every commit should include everything involved in one logical change. -

- -
- -

- That night, - Dracula wants to synchronize with Wolfman's work. - Before updating his working copy with svn update, - though, - he checks to see if he has made any changes locally - by running svn diff. - Without arguments, - it compares what's in his working copy to what he got the last time he updated. - There are no differences, - so there's no output: -

- -
-$ svn diff
-$
-
- -

- To compare his working copy to the master, - Dracula uses svn diff -r HEAD. - The -r flag is used to specify a revision, - while HEAD means - "the latest version of the master". -

- -
-$ svn diff -r HEAD
---- moons.txt(revision 8)
-+++ moons.txt(working copy)
-@@ -1,5 +1,6 @@
- Name            Orbital Radius  Orbital Period  Mass            Radius
-+Amalthea        181.4           0.498179        0.075           125.0
- Io              421.6           1.769138        893.2           1821.6
- Europa          670.9           3.551181        480.0           1560.8
- Ganymede        1070.4          7.154553        1481.9          2631.2
--Calisto         1882.7          16.689018       1075.9          2410.3
-+Callisto        1882.7          16.689018       1075.9          2410.3
-
-
- -

- After looking over the changes, - Dracula goes ahead and does the update. -

- -
-

Reading a Diff

- -

- The output of diff is cryptic even by Unix standards. - The first two lines: -

- -
---- moons.txt(revision 9)
-+++ moons.txt(working copy)
-
- -

- signal that '-' will be used to show content from revision 9 - and '+' to show content from the user's working copy. - The next line, with the '@' markers, - indicates where lines were inserted or removed. - This isn't really intended for human consumption: - editors and other tools can use this information - to replay a series of edits against a file. -

- -

- The most important parts of what follows are the lines marked with '+' and '-', - which show insertions and deletions respectively. - Here, - we can see that the line for Amalthea was inserted, - and that the line for Callisto was changed - (which is indicated by an add and a delete right next to one another). - Many editors and other tools can display diffs like this in a two-column display, - highlighting changes. -

- -
- -
-

Nothing's Perfekt

- -

- Version control systems do have one important shortcoming. - While it is easy for them to find, display, and merge differences in text files, - images, MP3s, PDFs, or Microsoft Word or Excel files aren't stored as text—they - use specialized binary data formats. - Most version control systems don't know how to deal with these formats, - so all they can say is, "These files differ." - Reconciling those differences will probably require use of an auxiliary tool, - such as an audio editor - or Microsoft Word's "Compare and Merge" utility. -

-
- -
-

Diffing Other Files

- -

- svn diff mimics the behavior of - the Unix diff command, - which can be used to compare any two files. - Given these two files: -

- - - - - - - - - - -
left.txtright.txt
-
hydrogen
-lithium
-sodium
-magnesium
-rubidium
-
-
hydrogen
-lithium
-beryllium
-sodium
-potassium
-strontium
-
- -

- diff's output is: -

-
-$ diff left.txt right.txt
-2a3
-> beryllium
-4,5c5,6
-< magnesium
-< rubidium
----
-> potassium
-> strontium
-
-
- -

- This is a very common workflow, - and is the basic heartbeat of most developers' days. - The steps are: -

- -
    - -
  1. - Update our working copy - so that we have any changes other people have committed. -
  2. - -
  3. - Do our own work. -
  4. - -
  5. - Commit our changes to the repository - so that other people can get them. -
  6. - -
- -

- It's worth noticing here how important Wolfman's comments about his changes were. - It's hard to see the difference between "Calisto" with one 'L' and "Callisto" with two, - even if the line containing the difference has been highlighted. - Without Wolfman's comments, - Dracula might have wasted time wondering what the difference was. -

- -

- In fact, - Wolfman should probably have committed his two changes separately, - since there's no logical connection between - fixing a typo in Callisto's name - and adding information about Amalthea to the same file. - Just as a function or program should do one job and one job only, - a single commit to version control should have a single logical purpose so that it's easier to find, - understand, - and if necessary undo later on. -

- -
-

Who Did What?

- -

- One other very useful command is svn blame, - which shows when each line in the file was last changed - and by whom: -

- -
-$ svn blame moons.txt
-    14    dracula Name            Orbital Radius  Orbital Period  Mass            Radius
-    14    dracula                 (10**3 km)      (days)          (10**20 kg)     (km)
-    14    dracula Amalthea        181.4           0.498179        0.075           131 x 73 x 67
-     9    mummy   Io              421.6           1.769138        893.2           1821.6
-     9    mummy   Europa          670.9           3.551181        480.0           1560.8
-     9    mummy   Ganymede        1070.4          7.154553        1481.9          2631.2
-    14    dracula Callisto        1882.7          16.689018       1075.9          2410.3
-    14    dracula Himalia         11460           250.5662        0.095           85.0
-    14    dracula Elara           11740           259.6528        0.008           40.0
-
- -

- If you are ever wondering who to talk to about a change, - or why it was made, - svn blame is a good place to start. -

-
- -
-

Summary

-
    -
  • Version control is a better way to manage shared files than email or shared folders.
  • -
  • The master copy is stored in a repository.
  • -
  • Nobody ever edits the master directory: instead, each person edits a local working copy.
  • -
  • People share changes by committing them to the master or updating their local copy from the master.
  • -
  • The version control system prevents people from overwriting each other's work by forcing them to merge concurrent changes before committing.
  • -
  • It also keeps a complete history of changes made to the master so that old versions can be recovered reliably.
  • -
  • Version control systems work best with text files, but can also handle binary files such as images and Word documents.
  • -
  • Every repository is identified by a URL.
  • -
  • Working copies of different repositories may not overlap.
  • -
  • Each changed to the master copy is identified by a unique revision number.
  • -
  • Revisions identify snapshots of the entire repository, not changes to individual files.
  • -
  • Each change should be commented to make the history more readable.
  • -
  • Commits are transactions: either all changes are successfully committed, or none are.
  • -
  • The basic workflow for version control is update-change-commit.
  • -
  • svn add things tells Subversion to start managing particular files or directories.
  • -
  • svn checkout url checks out a working copy of a repository.
  • -
  • svn commit -m "message" things sends changes to the repository.
  • -
  • svn diff compares the current state of a working copy to the state after the most recent update.
  • -
  • svn diff -r HEAD compares the current state of a working copy to the state of the master copy.
  • -
  • svn history shows the history of a working copy.
  • -
  • svn status shows the status of a working copy.
  • -
  • svn update updates a working copy from the repository.
  • -
-
- -
-

Challenges

- -
    - -
  1. - Using the repository URL, user ID, and password provided by the instructor, - perform the following actions: -
      -
    1. - Check out a working copy of the repository. -
    2. -
    3. - Create a text file called your_id.txt - (using your user ID instead of your_id) - and write a three-line biography of yourself in it. -
    4. -
    5. - Add this file to your working copy. -
    6. -
    7. - Commit your changes to the repository. -
    8. -
    9. - Update your working copy to get other people's biographies. -
    10. -
    11. - Examine the change log to see - the order in which people added their biographies - to the repository. -
    12. -
    -
  2. - -
  3. - What does the command svn diff -r 14 do? - What does it do if there have only been 10 changes to the repository? -
  4. - -
  5. - By default, - Unix diff and svn diff compare files line by line. - Why doesn't this work for MP3 audio files? -
  6. - -
-
- -
- -
-

Merging Conflicts

- -
-

Learning Objectives

-
    -
  • Explain what causes conflicts to occur and how to tell when one has occurred.
  • -
  • Resolve a conflict.
  • -
  • Identify the auxiliary files created when a conflict occurs.
  • -
-

- 20 minutes. -

-
- -

- Dracula and Wolfman have both synchronized their working copies of explore - with version 8 of the repository. - Dracula now edits his copy to change Amalthea's radius - from a single number to a triple to reflect its irregular shape: -

- -
-Name            Orbital Radius  Orbital Period  Mass            Radius
-Amalthea        181.4           0.498179        0.075           131 x 73 x 67
-Io              421.6           1.769138        893.2           1821.6
-Europa          670.9           3.551181        480.0           1560.8
-Ganymede        1070.4          7.154553        1481.9          2631.2
-Callisto        1882.7          16.689018       1075.9          2410.3
-
- -

- He then commits his work, - creating revision 9 of the repository - (Figure 9). -

- -
- After Dracula Commits -
Figure 9: After Dracula Commits
-
- -

- But while he is doing this, - Wolfman is editing his copy - to add information about two other minor moons, - Himalia and Elara: -

- -
-Name            Orbital Radius  Orbital Period  Mass            Radius
-Amalthea        181.4           0.498179        0.075           131
-Io              421.6           1.769138        893.2           1821.6
-Europa          670.9           3.551181        480.0           1560.8
-Ganymede        1070.4          7.154553        1481.9          2631.2
-Callisto        1882.7          16.689018       1075.9          2410.3
-Himalia         11460           250.5662        0.095           85.0
-Elara           11740           259.6528        0.008           40.0
-
- -

- When Wolfman tries to commit his changes to the repository, - Subversion won't let him: -

- -
-$ svn commit -m "Added data for Himalia, Elara"
-Sending        jupiter/moons.txt
-svn: Commit failed (details follow):
-svn: File or directory 'moons.txt' is out of date; try updating
-svn: resource out of date; try updating
-
- -

- The reason is that - Wolfman's changes were based on revision 8, - but the repository is now at revision 9, - and the file that Wolfman is trying to overwrite - is different in the later revision. - (Remember, - one of version control's main jobs is to make sure that - people don't trample on each other's work.) - Wolfman has to update his working copy to get Dracula's changes before he can commit. - Luckily, - Dracula edited a line that Wolfman didn't change, - so Subversion can merge the differences automatically. -

- -

- This does not mean that Wolfman's changes have been committed to the repository: - Subversion only does that when it's ordered to. - Wolfman's changes are still in his working copy, - and only in his working copy. - But since Wolfman's version of the file now includes - the lines that Dracula added, - Wolfman can go ahead and commit them as usual to create revision 10 - (Figure 10). -

- -
- Merging Without Conflict -
Figure 10: Merging Without Conflict
-
- -

- Wolfman's working copy is now in sync with the master, - but Dracula's is one behind at revision 9. - At this point, - they independently decide to add measurement units - to the columns in moons.txt. - Wolfman is quicker off the mark this time; - he adds a line to the file: -

- -
-Name            Orbital Radius  Orbital Period  Mass            Radius
-                (10**3 km)      (days)          (10**20 kg)     (km)
-Amalthea        181.4           0.498179        0.075           131 x 73 x 67
-Io              421.6           1.769138        893.2           1821.6
-Europa          670.9           3.551181        480.0           1560.8
-Ganymede        1070.4          7.154553        1481.9          2631.2
-Callisto        1882.7          16.689018       1075.9          2410.3
-Himalia         11460           250.5662        0.095           85.0
-Elara           11740           259.6528        0.008           40.0
-
- -

- and commits it to create revision 11. - While he is doing this, - though, - Dracula inserts a different line at the top of the file: -

- -
-Name            Orbital Radius  Orbital Period  Mass            Radius
-                * 10^3 km       * days          * 10^20 kg      * km
-Amalthea        181.4           0.498179        0.075           131 x 73 x 67
-Io              421.6           1.769138        893.2           1821.6
-Europa          670.9           3.551181        480.0           1560.8
-Ganymede        1070.4          7.154553        1481.9          2631.2
-Callisto        1882.7          16.689018       1075.9          2410.3
-Himalia         11460           250.5662        0.095           85.0
-Elara           11740           259.6528        0.008           40.0
-
- -

- Once again, - when Dracula tries to commit, - Subversion tells him he can't. - But this time, - when Dracula does updates his working copy, - he doesn't just get the line Wolfman added to create revision 11 - (Figure 11). -

- -
- Merge With Conflict -
Figure 11: Merge With Conflict
-
- -

- There is an actual conflict in the file, - so Subversion asks Dracula what he wants to do: -

- -
-$ svn update
-Conflict discovered in 'jupiter/moons.txt'.
-Select: (p) postpone, (df) diff-full, (e) edit,
-        (mc) mine-conflict, (tc) theirs-conflict,
-        (s) show all options:
-
- -

- Dracula choose p for "postpone", - which tells Subversion that he'll deal with the problem later. - Once the update is finished, - he opens moons.txt in his editor and sees: -

- -
- Name            Orbital Radius  Orbital Period  Mass
-+<<<<<<< .mine
-         +                * 10^3 km       * days         * 10^20 kg
-+=======
-+                (10**3 km)      (days)         (10**20 kg)
-+>>>>>>> .r11
- Amalthea        181.4           0.498179        0.074
- Io              421.6           1.769138        893.2
- Europa          670.9           3.551181        480.0
- Ganymede        1070.4          7.154553        1481.9
- Callisto        1882.7          16.689018       1075.9
-
- -

- As we can see, - Subversion has inserted - conflict markers - in moons.txt - wherever there is a conflict. - The line <<<<<<< .mine shows the start of the conflict, - and is followed by the lines from the local copy of the file. - The separator ======= is then - followed by the lines from the repository's file that are in conflict with that section, - while >>>>>>> .r11 marks the end of the conflict. -

- -

- Before he can commit, - Dracula has to edit his copy of the file to get rid of those markers. - He changes it to: -

- -
-Name            Orbital Radius  Orbital Period  Mass            Radius
-                (10^3 km)       (days)          (10^20 kg)      (km)
-Amalthea        181.4           0.498179        0.075           131 x 73 x 67
-Io              421.6           1.769138        893.2           1821.6
-Europa          670.9           3.551181        480.0           1560.8
-Ganymede        1070.4          7.154553        1481.9          2631.2
-Callisto        1882.7          16.689018       1075.9          2410.3
-Himalia         11460           250.5662        0.095           85.0
-Elara           11740           259.6528        0.008           40.0
-
- -

- then uses the svn resolved command to tell Subversion that - he has fixed the problem. - Subversion will now let him commit to create revision 12. -

- -
-

Auxiliary Files

- -

- When Dracula did his update and Subversion detected the conflict in moons.txt, - it created three temporary files to help Dracula resolve it. - The first is called moons.txt.r9; - it is the file as it was in Dracula's local copy - before he started making changes, - i.e., the common ancestor for his work - and whatever he is in conflict with. -

- -

- The second file is moons.txt.r11. - This is the most up-to-date revision from the repository—the - file as it is including Wolfman's changes. - The third temporary file, moons.txt.mine, - is the file as it was in Dracula's working copy before he did the Subversion update. -

- -

- Subversion creates these auxiliary files primarily - to help people merge conflicts in binary files. - It wouldn't make sense to insert <<<<<<< - and >>>>>>> characters into an image file - (it would almost certainly result in a corrupted image). - The svn resolved command deletes these three extra files - as well as telling Subversion that the conflict has been taken care of. -

- -
- -

- Some power users prefer to work with interpolated conflict markers directly, - but for the rest of us, - there are several tools for displaying differences and helping to merge them, - including Diffuse and WinMerge. - If Dracula launches Diffuse, - it displays his file, - the common base that he and Wolfman were working from, - and Wolfman's file in a three-pane view - (Figure 12): -

- -
- A Difference Viewer -
Figure 12: A Difference Viewer
-
- -

- Dracula can use the buttons to merge changes from either of the edited versions - into the common ancestor, - or edit the central pane directly. - Again, - once he is done, - he uses svn resolved and svn commit - to create revision 12 of the repository. -

- -

- In this case, the conflict was small and easy to fix. - However, if two or more people on a team are repeatedly creating conflicts for one another, - it's usually a signal of deeper communication problems: - either they aren't talking as often as they should, or their responsibilities overlap. - If used properly, - the version control system can help the team find and fix these issues - so that it will be more productive in future. -

- -
-

Working With Multiple Files

- -

- As mentioned earlier, - every logical change to a project should result in a single commit, - and every commit should represent one logical change. - This is especially true when resolving conflicts: - the work done to reconcile one person's changes with another are often complicated, - so it should be a single entry in the project's history, - with other, later, changes coming after it. -

- -
- -
-

Summary

-
    -
  • Conflicts must be resolved before a commit can be completed.
  • -
  • Subversion puts markers in text files to show regions of conflict.
  • -
  • For each conflicted file, Subversion creates auxiliary files containing the common parent, the master version, and the local version.
  • -
  • svn resolve files tells Subversion that conflicts have been resolved.
  • -
-
- -
-

Challenges

- -

- If you are working in a group, - partner with someone who has also wrote a biography for themselves - for the previous section's challenges. -

- -
    -
  1. - Both partners use svn update - to make sure their working copies are up to date - and that there are no local changes. -
  2. -
  3. - The first partner edits her biography and commits the changes. -
  4. -
  5. - The second partner edits her copy of the file - (without having updated to get the first partner's changes), - then tries to svn commit. -
  6. -
  7. - Once the second partner has resolved the conflict, - she commits her changes. -
  8. -
  9. - Repeat these four steps with roles reversed. -
  10. -
- -

- If you are working on your own, - you can simulate the steps above - by checking out a second copy of the project into a new directory. - (Remember, - this cannot overlap any existing checked-out copies.) - Edit your biography in one copy and commit those changes, - then switch to the other copy and edit the same file - before updating. -

-
- -
- -
-

Recovering Old Versions

- -
-

Learning Objectives

-
    -
  • Discard changes made to a working copy.
  • -
  • Recover an old version of a file.
  • -
  • Explain what branches are and when they are used.
  • -
-

- 20 minutes. -

-
- -

- Now that we have seen how to merge files and resolve conflicts, - we can look at how to use version control as an "infinite undo". - Suppose that when Wolfman starts work late one night, - his copy of explore is in sync with the head at revision 12. - He decides to edit the file moons.txt; - unfortunately, he forgot that there was a full moon, - so his changes don't make a lot of sense: -

- -
-Just one moon can make me growl
-Four would make me want to howl
-...
-
- -

- When he's back in human form the next day, - he wants to undo his changes. - Without version control, his choices would be grim: - he could try to edit them back into their original state by hand - (which for some reason hardly ever seems to work), - or ask his colleagues to send him their copies of the files - (which is almost as embarrassing as chasing the neighbor's cat when in wolf form). -

- -

- Since he's using Subversion, though, - and hasn't committed his work to the repository, - all he has to do is revert his local changes. - svn revert simply throws away local changes to files - and puts things back the way they were before those changes were made. - This is a purely local operation: - since Subversion stores the history of the project inside every working copy, - Wolfman doesn't need to be connected to the network to do this. -

- -

- To start, - Wolfman uses svn diff without the -r HEAD flag - to take a look at the differences between his file - and the master copy in the repository. - Since he doesn't want to keep his changes, - his next command is svn revert moons.txt. -

- -
-$ cd jupiter
-$ svn revert moons.txt
-Reverted   moons.txt
-
- -

- What if someone has committed their changes, - but still wants to undo them? - For example, - suppose Dracula decides that the numbers in moons.txt would look better with commas. - He edits the file to put them in: -

- -
-Name            Orbital Radius  Orbital Period  Mass            Radius
-                (10^3 km)       (days)          (10^20 kg)      (km)
-Amalthea        181.4           0.498179          0.075      131 x 73 x 67
-Io              421.6           1.769138        893.2          1,821.6
-Europa          670.9           3.551181        480.0          1,560.8
-Ganymede      1,070.4           7.154553      1,481.9          2,631.2
-Callisto      1,882.7          16.689018      1,075.9          2,410.3
-Himalia      11,460           250.5662            0.095           85.0
-Elara        11,740           259.6528            0.008           40.0
-
- -

- then commits his changes to create revision 13. - A little while later, - the Mummy sees the change and orders Dracula to put things back the way they were. - What should Dracula do? -

- -

- We can draw the sequence of events leading up to revision 13 - as shown in Figure 13: -

- -
- Before Undoing -
Figure 13: Before Undoing
-
- -

- Dracula wants to erase revision 13 from the repository, - but he can't actually do that: - once a change is in the repository, - it's there forever. - What he can do instead is merge the old revision with the current revision - to create a new revision - (Figure 14). -

- -
- Merging History -
Figure 14: Merging History
-
- -

- This is exactly like merging changes made by two different people; - the only difference is that the "other person" is his past self. -

- -

- To undo his commas, - Dracula must merge revision 12 (the one before his change) - with revision 13 (the current head revision) - using svn merge: -

- -
-$ svn merge -r HEAD:12 moons.txt
--- Reverse-merging r13 into 'moons.txt'
-U  moons.txt
-
- -

- The -r flag specifies the range of revisions to merge: - to undo the changes from revision 12 to revision 13, - he uses either 13:12 or HEAD:12 - (since he is going backward in time from the most recent revision to revision 12). - This is called a reverse merge - because he's going backward in time. -

- -

- After he runs this command, - he must run svn commit to save the changes to the repository. - This creates a new revision, number 14, - rather than erasing revision 13. - That way, - the changes he made to create revision 13 are still there - if he can ever convince the Mummy that numbers should have commas. -

- -
-

Another Way to Do It

- -

- Another way to recover a particular version of a particular file - is to use the svn copy command. - If the URL of our repository is - https://universal.software-carpentry.org/explore, - then the command: -

- -
-$ svn copy https://universal.software-carpentry.org/explore/mission.txt@120 ./mission.txt
-
- -

- copies the file mission.txt as it was in revision 120 - into our working directory - (overwriting whatever mission.txt file we currently have, - if any). - What's more, - using svn copy brings along the file's history as well, - so that future svn log operations will show - how mission.txt was resurrected. -

-
- -

- Merging can be used to recover older revisions of files, - not just the most recent, - and to recover many files or directories at a time. - The most frequent use, though, - is to manage parallel streams of development in large projects. - This is outside the scope of this chapter, - but the basic idea is simple. -

- -

- Suppose that Universal Missions has just released a new program - for designing interplanetary voyages. - Dracula and Wolfman are supposed to add some features - that were left out of the first release because time ran short. - At the same time, - Frankenstein and the Mummy are doing technical support: - their job is to fix any bugs that users find. -

- -

- All sorts of things could go wrong - if both teams tried to work on the same code at the same time. - In particular, - Dracula and Wolfman might want to make large changes - to the structure of the code - in order to make it easier to add new features, - while Frankenstein and the Mummy want to make as few changes as possible - so as not to introduce new bugs while fixing old ones. -

- -

- The usual way to handle this situation is - to create a branch - in the repository for each major sub-project - (Figure 15). - While Wolfman and Dracula work on - the main line, - Frankenstein and the Mummy create a branch, - which is just another copy of the repository's files and directories - that is also under version control. - They can work in their branch without disturbing Wolfman and Dracula and vice versa: -

- -
- Branching and Merging -
Figure 15: Branching and Merging
-
- -

- Branches in version control repositories are often described as "parallel universes". - Each branch starts off as a clone of the project at some moment in time - (typically each time the software is released, - or whenever work starts on a major new feature). - Changes made to a branch only affect that branch, - just as changes made to the files in one directory don't affect files in other directories. - However, - the branch and the main line are both stored in the same repository, - so their revision numbers are always in step. -

- -

- If someone decides that a bug fix in one branch should also be made in another, - all they have to do is merge the files in question. - This is exactly like merging an old version of a file with the current one, - but instead of going backward in time, - the change is brought sideways from one branch to another. -

- -

- Branching helps projects scale up by letting sub-teams work independently, - but too many branches can cause as many problems as they solve. - Karl Fogel's excellent book - Producing Open Source Software, - and Laura Wingerd and Christopher Seiwald's paper - "High-level Best Practices in Software Configuration Management", - talk about branches in much more detail. - Projects usually don't need to do this until they have a dozen or more developers, - or until several versions of their software are in simultaneous use, - but using branches is a key part of switching from software carpentry to software engineering. -

- -
-

Summary

-
    -
  • Old versions of files can be recovered by merging their old state with their current state.
  • -
  • Recovering an old version of a file does not erase the intervening changes.
  • -
  • Use branches to support parallel independent development.
  • -
  • svn revert undoes local changes to files.
  • -
  • svn merge merges two revisions of a file.
  • -
-
- -
-

Challenges

- -
    -
  1. - Explain what the command: -
    -svn diff -r 240:261 fish.dat
    -
    - does, and when you might want to run it. -
  2. - -
  3. - Suppose that a file called mission.txt - existed in revision 90 of a repository, - but had been deleted in revision 91. - What two commands could we use to recover it? -
  4. - -
-
- -
- -
-

Setting Up a Repository

- -
-

Learning Objectives

-
    -
  • How to create a repository.
  • -
-

- 25 minutes - (mostly discussion about where to host repositories). -

-
- -

- It is finally time to see how to create a repository. - As a quick recap, - we will keep the master copy of our work in a repository - on a server that we can access from other machines on the internet. - That master copy consists of files and directories that no-one ever edits directly. - Instead, a copy of Subversion running on that machine - manages updates for us and watches for conflicts. - Our working copy is a mirror image of the master sitting on our computer. - When our Subversion client needs to communicate with the master, - it exchanges data with the copy of Subversion running on the server. -

- -

- To make this to work, we need four things: -

- -
    - -
  1. - The repository itself. - It's not enough to create an empty directory and start filling it with files: - Subversion needs to create a lot of other structure - in order to keep track of old revisions, who made what changes, and so on. -
  2. - -
  3. - The full URL of the repository. - This includes the URL of the server - and the path to the repository on that machine. - (The second part is needed because a single server can, - and usually will, - host many repositories.) -
  4. - -
  5. - Permission to read or write the master copy. - Many open source projects give the whole world permission to read from their repository, - but very few allow strangers to write to it: - there are just too many possibilities for abuse. - Somehow, we have to set up a password or something like it - so that users can prove who they are. -
  6. - -
  7. - A working copy of the repository on our computer. - Once the first three things are in place, - this just means running the checkout command. -
  8. - -
- -

- To keep things simple, - we will start by creating a repository on the machine that we're working on. - This won't let us share our work with other people, - but it will allow us to save the history of our work as we go along. -

- -

- The command to create a repository is svnadmin create, - followed by the path to the repository. - If we want to create a repository called missions_repo - directly under our home directory, - we just cd to get home - and run svnadmin create missions_repo. - This command creates a directory called missions_repo to hold our repository, - and fills it with various files that Subversion uses - to keep track of the project's history: -

- -
-$ cd
-$ svnadmin create missions_repo
-$ ls -F missions_repo
-README.txt    conf/    db/    format    hooks/    locks/
-
- -

- We should never edit any of this directly, - since it will almost certainly make the repository unusable. - Instead, - we should use svn checkout - to get a working copy of this repository. - If our home directory is /users/mummy, - then the full path to the repository we just created is /users/mummy/missions_repo, - so we run svn checkout file:///users/mummy/missions missions_working. -

- -

- Working backward, - the second argument, - missions_working, - specifies where the working copy is to be put. - The first argument is the URL of our repository, - and it has two parts. - /users/mummy/missions_repo is the path to repository directory. - file:// specifies the protocol - that Subversion will use to communicate with the repository—in this case, - it says that the repository is part of the local machine's filesystem. - (Notice that the protocol ends in two slashes, - while the absolute path to the repository starts with a slash, - making three in total. - A very common mistake is to type only two, since that's what web URLs normally have.) -

- -

- When we're doing a checkout, - it is very important that we provide the second argument, - which specifies the name of the directory we want the working copy to be put in. - Without it, - Subversion will try to use the name of the repository, - missions_repo, - as the name of the working copy. - Since we're in the directory that contains the repository, - this means that Subversion will try to overwrite the repository with a working copy. - Again, - there isn't much risk of our sanity being torn to shreds, - but this could ruin our repository. -

- -

- To avoid this problem, - most people create a sub-directory in their account called something like repos, - and then create their repositories in that. - For example, - we could create our repository in /users/mummy/repos/missions, - then check out a working copy as /users/mummy/missions. - This practice makes both names easier to read. -

- -

- The obvious next step is to put our repository on a server, - rather than on our personal machine. - In fact, - we should always do this - so that we don't lose the history of our project - if our laptop is damaged or stolen. - A departmental server is also much more likely to be backed up regularly - than our personal machine… -

- -

- Creating a repository on a server is simple: - just log in and go through the steps described above. - Accessing that repository from another machine - is also straightforward. - If the machine's address is serv.euphoric.edu, - and our user ID is dracula, - the URL of the repository will be something like: -

- -
-svn+ssh://dracula@serv.euphoric.edu/home/dracula/repos/missions
-
- -

- Reading from left to right: -

- - - -

- That's fine if you are the only person using the repository, - but if you want to share it with others, - you need to worry about security. - As we discuss in the lesson on web programming, - as soon as you provide a service on the internet, - there's the possibility that someone may try to attack your system through it. - Rather than trying to learn enough system administration skills - to set things up safely, - it is usually easier to: -

- - - -

- If you choose the second or third option, - please check with whoever handles intellectual property at your institution - to make sure that putting your work on a commercially-operated machine - that is probably in some other legal jurisdiction - isn't going to cause trouble. - Many people assume that it's "just OK", - while others act as if not having asked will be an acceptable defence later on. - Unfortunately, - neither is true… -

- -
-

Summary

-
    -
  • svnadmin create name creates a new repository.
  • -
  • Repositories can be hosted locally, on local (departmental) servers, on hosting services, or on their owners' own domains.
  • -
-
- -
-

Challenges

- -
    - -
  1. - Create a Subversion repository called trials_repo - in your home directory. - Check out a working copy in a directory called trials_working - (also in your home directory). - Add a couple of text files, - commit the changes, - and then use svn info trials_working - to see what Subversion tells you about your working copy. -
  2. - -
  3. - We said above that - you might be the only person using a particular repository. - When and why is version control worth using - if no-one else is working on a project with you? -
  4. - -
  5. - There are many ways to organize repositories. - Some of the most common are to create one repository for: -
      -
    • each person
    • -
    • each paper
    • -
    • all the work done on one grant
    • -
    • all the work done on one project
    • -
    • the entire lab (which is shared by everyone in the lab)
    • -
    • the entire department (typically with a top-level directory for each person or project in the department)
    • -
    - What activities does each one make easy or hard? - Which of these would you prefer, and why? -
  6. - -
-
- -
- -
-

Provenance

- -
-

Learning Objectives

-
    -
  • What data provenance is.
  • -
  • How to embed version numbers and other information in files managed by version control.
  • -
  • How to record version information about a program in its output.
  • -
-

- 20 minutes - (without a practical exercise). -

-
- -

- In art, - the provenance of a work - is the history of who owned it, when, and where. - In science, - it's the record of how a particular result came to be: - what raw data was processed by what version of what program to create which intermediate files, - what was used to turn those files into which figures of which papers, - and so on. -

- -

- One of the big benefits of using version control is that - it lets us track the provenance of scientific data automatically. - To start, - suppose we have a text file combustion.dat in a Subversion repository. - Run the following two commands: -

- -
-$ svn propset svn:keywords Revision combustion.dat
-$ svn commit -m "Turning on the 'Revision' keyword" combustion.dat
-
- -

- This does nothing by itself, - but now open the file in an editor - and add the following line somewhere near the top: -

- -
-$Revision:$
-
- -

- The $Revision:$ string means something special to Subversion. - Save the file, and commit the change: -

- -
-$ svn commit -m "Inserting the 'Revision' keyword" combustion.dat
-
- -

- When we open the file again, - we'll see that Subversion has changed that line to something like: -

- -
-$Revision: 143$
-
- -

- i.e., it has inserted the version number - after the colon and before the closing $. - If we edit the file again—e.g., add a couple of lines with random numbers—and - commit once more, - the line is updated again to: -

- -
-$Revision: 144$
-
- -

- Here's what just happened. - First, Subversion allows uss to add - properties - to files and and directories. - These properties aren't stored in the files or directories themselves, - but in Subversion's database. - One of those properties, - svn:keywords, - tells Subversion to look in files that are being changed - for strings of the form $propertyname: …$, - where propertyname is a string like Revision or Author. - (About half a dozen such strings are supported.) -

- -

- If it sees such a string, - Subversion rewrites it as the commit is taking place to replace - with the current version number, - the name of the person making the change, - or whatever else the property's name tells it to do. - We only have to add the string to the file once; - after that, - Subversion updates it for you every time the file changes. -

- -

- Putting the version number in the file this way can be pretty handy. - If you copy the file to another machine, - for example, - it carries its version number with it, - so you can tell which version you have even if it's outside version control. - We'll see some more useful things we can do with this information later. -

- -

- We can use this trick with shell scripts too, - or with almost any other kind of program. - Let's go back to Nelle Nemo's data processing from - the lesson on the shell. - Suppose she writes a shell script called gooclean - to tidy up data files. - Her first version looks like this: -

- -
-# gooclean: clean up a single data file
-goonorm -b 0 100 < $1 | goofilter -x --enlarge 2.0 > cleaned-$1
-
- -

- i.e., - it runs goonorm and then goofilter with some fixed parameters - and creates an output file called cleaned-something.dat - (if the input file's name was something.dat). - Assuming that '#' is the comment character for her output files, - she could instead write: -

- -
-# gooclean: clean up a single data file
-echo "# gooclean $Revision:$" > cleaned-$1
-goonorm -b 0 100 < $1 | goofilter -x --enlarge 2.0 >> cleaned-$1
-
- -

- then set the svn:keywords property - and commit the file to insert the revision number, - making it: -

- -
-# gooclean: clean up a single data file
-echo "# gooclean $Revision: 487$" > cleaned-$1
-goonorm -b 0 100 < $1 | goofilter -x --enlarge 2.0 >> cleaned-$1
-
- -

- Now, - each time this script is run it will: -

- - - -

- In other words, - the output of this shell script will always record - exactly what version of the script produced it. - This isn't enough to reproduce the output—we would need to record - the version numbers of the input files and the goonorm and goofilter programs, - and the values of the parameters those programs used - in order to do that—but it's an important and useful first step. -

- -
-

Summary

-
    -
  • $Keyword: …$ in a file can be filled in with a property value each time the file is committed.
  • -
  • Put version numbers in programs' output to establish provenance for data.
  • -
  • svn propset svn:keywords property files tells Subversion to start filling in property values.
  • -
-
- -
-

Challenges

- -
    - -
  1. - Add $Id:$ to a file, - use svn propset to set the corresponding property, - and then commit a change to the file. - What value does Subversion fill in for this keyword? - When would you use this rather than Revision or Author? -
  2. - -
  3. - What does the svn:ignore property do when applied to a directory? - When would you use it? -
  4. - -
- -
- -
- -
-

Summing Up

- -

- In 2006, - McCullough, McGeary, and Harrison - analyzed several years of - the data and code archive of Journal of Money, Credit, and Banking, - a prestigious journal with a mandatory archiving policy. - Of 266 articles published during that time, - 193 were empirical and should have had data and code deposited in the archive. - Of those, - only 69 actually had anything in the archive; - Excluding eleven articles that only had data, - and seven that required software or other resources they did not have, - McCullough et al. were only able to replicate 14 of the remaining 186 articles. - This doesn't mean that the other 92% were wrong, - but it does mean there is no practical way to tell. -

- -

- By itself, - version control doesn't making computational research reproducible. - It does help, - though, - and also eliminates the frustration and wasted time caused by - trying to figure out which emailed copy of a file, - or which of a dozen directories or USB drives, - is the most recent. - And while correlation doesn't imply causality, - there is certainly a strong correlation between - knowing enough about good computational practices to use version control - and knowing how to do other things right as well. -

- -
-{% endblock content %} diff --git a/version-control/README.md b/version-control/README.md new file mode 100644 index 0000000..becae9b --- /dev/null +++ b/version-control/README.md @@ -0,0 +1,135 @@ +Basic use +========= + +Learning objectives +------------------- + +* Draw a diagram showing the places version control stores + information. +* Check out a working copy of a repository. +* View the history of changes to a project. +* Explain why working copies of different projects should not overlap. +* Add files to a project. +* Commit changes made to a working copy to a repository. +* Update a working copy to get changes from the repository. +* Compare the current state of a working copy to the last update from + the repository, and to the current state of the repository. +* Explain what "version 123 of `xyz.txt`" actually means. + +Key points +---------- + +* Version control is a better way to manage shared files than email or + shared folders. +* The master copy is stored in a repository. +* Nobody ever edits the master directory: instead, each person edits a + local working copy. +* People share changes by committing them to the master or updating + their local copy from the master. +* The version control system prevents people from overwriting each + other's work by forcing them to merge concurrent changes before + committing. +* It also keeps a complete history of changes made to the master so + that old versions can be recovered reliably. +* Version control systems work best with text files, but can also + handle binary files such as images and Word documents. +* Every repository is identified by a URL. +* Working copies of different repositories may not overlap. +* Each changed to the master copy is identified by a unique revision + number. +* Revisions identify snapshots of the entire repository, not changes + to individual files. +* Each change should be commented to make the history more readable. +* Commits are transactions: either all changes are successfully + committed, or none are. +* The basic workflow for version control is update-change-commit. +* `svn add things` tells Subversion to start managing + particular files or directories. +* `svn checkout $URL` checks out a working copy of a repository. +* `svn commit -m "$MESSAGE" $THINGS` sends changes to the repository. +* `svn diff` compares the current state of a working copy to the state + after the most recent update. +* `svn diff -r HEAD` compares the current state of a working copy to + the state of the master copy. +* `svn history` shows the history of a working copy. +* `svn status` shows the status of a working copy. +* `svn update` updates a working copy from the repository. + +Merging conflicts +================= + +Learning objectives +------------------- + +* Explain what causes conflicts to occur and how to tell when one has + occurred. +* Resolve a conflict. +* Identify the auxiliary files created when a conflict occurs. + +Key points +---------- + +* Conflicts must be resolved before a commit can be completed. +* Subversion puts markers in text files to show regions of conflict. +* For each conflicted file, Subversion creates auxiliary files + containing the common parent, the master version, and the local + version. +* `svn resolve $FILES` tells Subversion that conflicts have been + resolved. + +Recovering old versions +======================= + +Learning objectives +------------------- + +* Discard changes made to a working copy. +* Recover an old version of a file. +* Explain what branches are and when they are used. + +Key points +---------- + +* Old versions of files can be recovered by merging their old state + with their current state. +* Recovering an old version of a file does not erase the intervening + changes. +* Use branches to support parallel independent development. +* `svn revert` undoes local changes to files. +* `svn merge` merges two revisions of a file. + +Setting up a repository +======================= + +Learning objectives +------------------- + +* How to create a repository. + +Key points +---------- + +* `svnadmin create $NAME` creates a new repository. +* Repositories can be hosted locally, on local (departmental) servers, + on hosting services, or on their owners' own domains. + +Provenance +========== + +Learning objectives +------------------- + +* What data provenance is. +* How to embed version numbers and other information in files managed + by version control. +* How to record version information about a program in its output. + +Key points +---------- + +* `$Keyword: …$` in a file can be filled in with a property value each + time the file is committed. +* Put version numbers in programs' output to establish provenance for + data. +* `svn propset svn:keywords $PROPERTY $FILES` tells Subversion to + start filling in property values. diff --git a/version-control/instructor.md b/version-control/instructor.md new file mode 100644 index 0000000..9fe2757 --- /dev/null +++ b/version-control/instructor.md @@ -0,0 +1,70 @@ +Version control is the most important practical skill we introduce. +as the last paragraph of the introduction above says, the workflow +matters more than the ins and outs of any particular tool. By the end +of 90 minutes, the instructor should be able to get learners to chant, +"Update, edit, merge, commit," in unison, and have them understand +what those terms mean and why that's a good way to structure their +working day. + +Provided there aren't network problems, this entire lesson can be +covered in 90 minutes. The example at the end showing how to use +Subversion keywords to track provenance is the "ah ha!" moment for +many learners. If time is short, skip the material on recovering old +versions of files in order to get to this section instead. (The fact +that provenance is harder in Git, both mechanically and conceptually, +is one reason to keep teaching Subversion.) + +Prerequisites +------------- + +* Basic shell concepts and skills (`ls`, `cd`, `mkdir`, editing + files). +* Basic shell scripting (for the discussion of provenance). + +Teaching notes +-------------- + +* Make sure the network is working *before* starting this lesson. + +* Give learners a ten-minute overview of what version control does for + them before diving into the watch-and-do practicals. Most of them + will have tried to co-author papers by emailing files back and + forth, or will have biked into the office only to realize that the + USB key with last night's work is still on the kitchen table. + Instructors can also make jokes about directories with names like + "final version", "final version revised", "final version with + reviewer three's corrections", "really final version", and, "come on + this really has to be the last version" to motivate version control + as a better way to collaborate and as a better way to back work up. + +* Version control is typically taught after the shell, so collect + learners' names during that session and create a repository for them + to share with their names as both their IDs and their passwords. + The easiest way to create the repository is to use a server managed + by an ISP such as Dreamhost, or on SourceForge, Google Code, or some + other "forge" site, all of which provide web interfaces for + repository creation and management. If your learners are advanced + enough to be using SSH, you can instead create it on any server they + can access, and connect with the `svn+ssh` protocol instead of + HTTPS. + +* Be very clear what files learners are to edit and what user IDs they + are to use when giving instructions. It is common for them to edit + the instructor's biography, or to use the instructor's user ID and + password when committing. Be equally clear *when* they are to edit + things: it's also common for someone to edit the file the instructor + is editing and commit changes while the instructor is explaining + what's going on, so that a conflict occurs when the instructor comes + to commit the file. + +* Learners could do most exercises with repositories on their own + machines, but it's hard for them to see how version control helps + collaboration unless they're sharing a repository with other + learners. In particular, showing learners who changed what using + `svn blame` is only compelling if a file has been edited by at least + two people. + +* If some learners are using Windows, there will inevitably be issues + merging files with different line endings. `svn diff -x -w` is + supposed to suppress differences in whitespace, but we have found + that it doesn't always work as advertised.