allocation growing API

From: Junio C Hamano Date: Sat, 15 Dec 2007 08:40:54 +0000 (+0000) Subject: Autogenerated HTML docs for v1.5.4-rc0-36-g7680 X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=3dac5045e9f35540f547a9f1a79be3daf1271cf6;p=git.git Autogenerated HTML docs for v1.5.4-rc0-36-g7680 --- diff --git a/RelNotes-1.5.4.txt b/RelNotes-1.5.4.txt index d6fd3ddd1..89e6fe32b 100644 --- a/RelNotes-1.5.4.txt +++ b/RelNotes-1.5.4.txt @@ -7,6 +7,9 @@ Removal * "git svnimport" was removed in favor of "git svn". It is still there in the source tree (contrib/examples) but unsupported. + * As git-commit and git-status have been rewritten, "git runstatus" + helper script lost all its users and has been removed. + Deprecation notices ------------------- @@ -261,6 +264,9 @@ Updates since v1.5.3 between svn and git; a new representation that is much more compact for this information has been introduced to correct this. + * "git svn" left temporary index files it used without cleaning them + up; this was corrected. + * "git status" from a subdirectory now shows relative paths, which makes copy-and-pasting for git-checkout/git-add/git-rm easier. The traditional behaviour to show the full path relative to the top of @@ -297,6 +303,9 @@ this release, unless otherwise noted. These fixes are only in v1.5.4 and not backported to v1.5.3 maintenance series. + * The way "git diff --check" behaves is much more consistent with the way + "git apply --whitespace=warn" works. + * "git svn" talking with the SVN over http will correctly quote branch and project names. @@ -305,6 +314,6 @@ series. -- exec >/var/tmp/1 -O=v1.5.4-rc0 +O=v1.5.4-rc0-35-g530e741 echo O=`git describe refs/heads/master` git shortlog --no-merges $O..refs/heads/master ^refs/heads/maint diff --git a/cmds-ancillaryinterrogators.txt b/cmds-ancillaryinterrogators.txt index 8dfde9856..cefbea6ca 100644 --- a/cmds-ancillaryinterrogators.txt +++ b/cmds-ancillaryinterrogators.txt @@ -31,9 +31,6 @@ gitlink:git-rerere[1]:: gitlink:git-rev-parse[1]:: Pick out and massage parameters. -gitlink:git-runstatus[1]:: - A helper for git-status and git-commit. - gitlink:git-show-branch[1]:: Show branches and their commits. diff --git a/diff-options.txt b/diff-options.txt index 5d22b7b58..9ecc1d7bc 100644 --- a/diff-options.txt +++ b/diff-options.txt @@ -93,7 +93,9 @@ endif::git-format-patch[] --check:: Warn if changes introduce trailing whitespace - or an indent that uses a space before a tab. + or an indent that uses a space before a tab. Exits with + non-zero status if problems are found. Not compatible with + --exit-code. --full-index:: Instead of the first handful characters, show full diff --git a/git-diff-files.html b/git-diff-files.html index 8662cceaf..2a1b53dc1 100644 --- a/git-diff-files.html +++ b/git-diff-files.html @@ -452,7 +452,9 @@ same as "git-diff-index" and "git-diff-tree".

Warn if changes introduce trailing whitespace - or an indent that uses a space before a tab. + or an indent that uses a space before a tab. Exits with + non-zero status if problems are found. Not compatible with + --exit-code.

@@ -1235,7 +1237,7 @@ the pathname, but if that is NUL, the record will show two paths.

diff --git a/git-diff-index.html b/git-diff-index.html index 1ed884ea4..01e0d3fb0 100644 --- a/git-diff-index.html +++ b/git-diff-index.html @@ -453,7 +453,9 @@ entries in the index are compared.

@@ -1317,7 +1319,7 @@ always have the special all-zero sha1. diff --git a/git-diff-tree.html b/git-diff-tree.html index d0fc7a44d..45a07b9e7 100644 --- a/git-diff-tree.html +++ b/git-diff-tree.html @@ -455,7 +455,9 @@ git-diff-tree(1) Manual Page

@@ -1705,7 +1707,7 @@ the pathname, but if that is NUL, the record will show two paths.

diff --git a/git-diff.html b/git-diff.html index 75a995599..de3151fd0 100644 --- a/git-diff.html +++ b/git-diff.html @@ -531,7 +531,9 @@ and the range notations ("<commit>..<commit>" and

@@ -1432,7 +1434,7 @@ Output diff in reverse. diff --git a/git-format-patch.html b/git-format-patch.html index 418908e49..a7732ffd5 100644 --- a/git-format-patch.html +++ b/git-format-patch.html @@ -499,7 +499,9 @@ reference.

@@ -991,7 +993,7 @@ git-format-patch -3 diff --git a/git-help.html b/git-help.html index b58cd0deb..97c10270c 100644 --- a/git-help.html +++ b/git-help.html @@ -272,7 +272,7 @@ git-help(1) Manual Page

SYNOPSIS

git help [-a|--all|-i|--info|-w|--web] [COMMAND]

git help [-a|--all|-i|--info|-m|--man|-w|--web] [COMMAND]

DESCRIPTION

@@ -283,7 +283,7 @@ on the standard output.

printed on the standard output.

If a git command is named, a manual page for that command is brought up. The man program is used by default for this purpose, but this -can be overriden by other options.

+can be overriden by other options or configuration variables.

Note that git --help … is identical as git help … because the former is internally converted into the latter.

@@ -309,6 +309,16 @@ former is internally converted into the latter.

+-m|--man +

+ Use the man program to display the manual page. This may be + used to override a value set in the help.format + configuration variable. +

-w|--web

@@ -318,18 +328,55 @@ former is internally converted into the latter.

The web browser can be specified using the configuration variable help.browser, or web.browser if the former is not set. If none of -these config variables is set, the git-browse-help script (called by -git-help) will pick a suitable default.

+these config variables is set, the git-browse--help helper script +(called by git-help) will pick a suitable default.

You can explicitly provide a full path to your prefered browser by setting the configuration variable browser.<tool>.path. For example, you can configure the absolute path to firefox by setting -browser.firefox.path. Otherwise, git-browse-help assumes the tool +browser.firefox.path. Otherwise, git-browse--help assumes the tool is available in PATH.

Note that the script tries, as much as possible, to display the HTML page in a new tab on an already opened browser.

CONFIGURATION VARIABLES

If no command line option is passed, the help.format configuration +variable will be checked. The following values are supported for this +variable; they make git-help behave as their corresponding command +line option:

+
+"man" corresponds to -m|--man, +
+
+
+"info" corresponds to -i|--info, +
+
+
+"web" or "html" correspond to -w|--web, +
+

The help.browser, web.browser and browser.<tool>.path will also +be checked if the web format is choosen (either by command line +option or configuration variable). See -w|--web in the OPTIONS +section above.

Note that these configuration variables should probably be set using +the --global flag, for example like this:

$ git config --global help.format web
+$ git config --global web.browser firefox

as they are probably more user specific than repository specific. +See git-config(1) for more information about this.

Author

Written by Junio C Hamano <gitster@pobox.com> and the git-list @@ -347,7 +394,7 @@ little. Maintenance is done by the git-list <git@vger.kernel.org>.

diff --git a/git-help.txt b/git-help.txt index ac9e15d77..8cd69e712 100644 --- a/git-help.txt +++ b/git-help.txt @@ -7,7 +7,7 @@ git-help - display help information about git SYNOPSIS -------- -'git help' [-a|--all|-i|--info|-w|--web] [COMMAND] +'git help' [-a|--all|-i|--info|-m|--man|-w|--web] [COMMAND] DESCRIPTION ----------- @@ -21,7 +21,7 @@ printed on the standard output. If a git command is named, a manual page for that command is brought up. The 'man' program is used by default for this purpose, but this -can be overriden by other options. +can be overriden by other options or configuration variables. Note that 'git --help ...' is identical as 'git help ...' because the former is internally converted into the latter. @@ -36,24 +36,57 @@ OPTIONS Use the 'info' program to display the manual page, instead of the 'man' program that is used by default. +-m|--man:: + Use the 'man' program to display the manual page. This may be + used to override a value set in the 'help.format' + configuration variable. + -w|--web:: Use a web browser to display the HTML manual page, instead of the 'man' program that is used by default. + The web browser can be specified using the configuration variable 'help.browser', or 'web.browser' if the former is not set. If none of -these config variables is set, the 'git-browse-help' script (called by -'git-help') will pick a suitable default. +these config variables is set, the 'git-browse--help' helper script +(called by 'git-help') will pick a suitable default. + You can explicitly provide a full path to your prefered browser by setting the configuration variable 'browser..path'. For example, you can configure the absolute path to firefox by setting -'browser.firefox.path'. Otherwise, 'git-browse-help' assumes the tool +'browser.firefox.path'. Otherwise, 'git-browse--help' assumes the tool is available in PATH. + Note that the script tries, as much as possible, to display the HTML page in a new tab on an already opened browser. +CONFIGURATION VARIABLES +----------------------- + +If no command line option is passed, the 'help.format' configuration +variable will be checked. The following values are supported for this +variable; they make 'git-help' behave as their corresponding command +line option: + +* "man" corresponds to '-m|--man', +* "info" corresponds to '-i|--info', +* "web" or "html" correspond to '-w|--web', + +The 'help.browser', 'web.browser' and 'browser..path' will also +be checked if the 'web' format is choosen (either by command line +option or configuration variable). See '-w|--web' in the OPTIONS +section above. + +Note that these configuration variables should probably be set using +the '--global' flag, for example like this: + +------------------------------------------------ +$ git config --global help.format web +$ git config --global web.browser firefox +------------------------------------------------ + +as they are probably more user specific than repository specific. +See gitlink:git-config[1] for more information about this. + Author ------ Written by Junio C Hamano and the git-list diff --git a/git-log.html b/git-log.html index f22ae219d..02f89d0dd 100644 --- a/git-log.html +++ b/git-log.html @@ -492,7 +492,9 @@ people using 80-column terminals.

@@ -1447,7 +1449,7 @@ reversible operation.

diff --git a/git-runstatus.txt b/git-runstatus.txt deleted file mode 100644 index dee5d0da9..000000000 --- a/git-runstatus.txt +++ /dev/null @@ -1,68 +0,0 @@ -git-runstatus(1) -================ - -NAME ----- -git-runstatus - A helper for git-status and git-commit - - -SYNOPSIS --------- -'git-runstatus' [--color|--nocolor] [--amend] [--verbose] [--untracked] - - -DESCRIPTION ------------ -Examines paths in the working tree that has changes unrecorded -to the index file, and changes between the index file and the -current HEAD commit. The former paths are what you _could_ -commit by running 'git add' (or 'git rm' if you are deleting) before running 'git -commit', and the latter paths are what you _would_ commit by -running 'git commit'. - -If there is no path that is different between the index file and -the current HEAD commit, the command exits with non-zero status. - -Note that this is _not_ the user level command you would want to -run from the command line. Use 'git-status' instead. - - -OPTIONS -------- ---color:: - Show colored status, highlighting modified file names. - ---nocolor:: - Turn off coloring. - ---amend:: - Show status based on HEAD^1, not HEAD, i.e. show what - 'git-commit --amend' would do. - ---verbose:: - Show unified diff of all file changes. - ---untracked:: - Show files in untracked directories, too. Without this - option only its name and a trailing slash are displayed - for each untracked directory. - - -OUTPUT ------- -The output from this command is designed to be used as a commit -template comments, and all the output lines are prefixed with '#'. - - -Author ------- -Originally written by Linus Torvalds as part -of git-commit, and later rewritten in C by Jeff King. - -Documentation --------------- -Documentation by David Greaves, Junio C Hamano and the git-list . - -GIT ---- -Part of the gitlink:git[7] suite diff --git a/git.html b/git.html index 98342c845..23033ebe6 100644 --- a/git.html +++ b/git.html @@ -391,6 +391,7 @@ user-manual and the Core tutorial both prov introductions to the underlying git architecture.

See also the howto documents for some useful examples.

The internals are documented here.

GIT COMMANDS

@@ -883,14 +884,6 @@ ancillary user utilities.

-git-runstatus(1) -

- A helper for git-status and git-commit. -

git-show-branch(1)

@@ -1945,7 +1938,7 @@ contributors on the git-list <git@vger.kernel.org>.

diff --git a/git.txt b/git.txt index a29b634e7..e0f9a4490 100644 --- a/git.txt +++ b/git.txt @@ -153,6 +153,8 @@ introductions to the underlying git architecture. See also the link:howto-index.html[howto] documents for some useful examples. +The internals are documented link:technical/api-index.html[here]. + GIT COMMANDS ------------ diff --git a/technical/api-allocation-growing.html b/technical/api-allocation-growing.html new file mode 100644 index 000000000..b147af9f2 --- /dev/null +++ b/technical/api-allocation-growing.html @@ -0,0 +1,313 @@ + + + + + + +allocation growing API + + + +

Dynamically growing an array using realloc() is error prone and boring.

Define your array with:

+
+a pointer (ary) that points at the array, initialized to NULL; +
+
+
+an integer variable (alloc) that keeps track of how big the current + allocation is, initialized to 0; +
+
+
+another integer variable (nr) to keep track of how many elements the + array currently has, initialized to 0. +
+

Then before adding n`th element to the array, call `ALLOC_GROW(ary, n, +alloc). This ensures that the array can hold at least n elements by +calling realloc(3) and adjusting alloc variable.

sometype *ary;
+size_t nr;
+size_t alloc
+
+for (i = 0; i < nr; i++)
+        if (we like ary[i] already)
+                return;
+
+/* we did not like any existing one, so add one */
+ALLOC_GROW(ary, nr + 1, alloc);
+ary[nr++] = value you like;

You are responsible for updating the nr variable.

+ + + diff --git a/technical/api-allocation-growing.txt b/technical/api-allocation-growing.txt new file mode 100644 index 000000000..43dbe09f7 --- /dev/null +++ b/technical/api-allocation-growing.txt @@ -0,0 +1,34 @@ +allocation growing API +====================== + +Dynamically growing an array using realloc() is error prone and boring. + +Define your array with: + +* a pointer (`ary`) that points at the array, initialized to `NULL`; + +* an integer variable (`alloc`) that keeps track of how big the current + allocation is, initialized to `0`; + +* another integer variable (`nr`) to keep track of how many elements the + array currently has, initialized to `0`. + +Then before adding `n`th element to the array, call `ALLOC_GROW(ary, n, +alloc)`. This ensures that the array can hold at least `n` elements by +calling `realloc(3)` and adjusting `alloc` variable. + +------------ +sometype *ary; +size_t nr; +size_t alloc + +for (i = 0; i < nr; i++) + if (we like ary[i] already) + return; + +/* we did not like any existing one, so add one */ +ALLOC_GROW(ary, nr + 1, alloc); +ary[nr++] = value you like; +------------ + +You are responsible for updating the `nr` variable. diff --git a/technical/api-builtin.html b/technical/api-builtin.html new file mode 100644 index 000000000..4bedd40dc --- /dev/null +++ b/technical/api-builtin.html @@ -0,0 +1,366 @@ + + + + + + +builtin API + + + +

Adding a new built-in

There are 4 things to do to add a bulit-in command implementation to +git:

+
+Define the implementation of the built-in command foo with + signature: +
+
+
+
```
int cmd_foo(int argc, const char **argv, const char *prefix);
```
+
+
+
+Add the external declaration for the function to builtin.h. +
+
+
+Add the command to commands[] table in handle_internal_command(), + defined in git.c. The entry should look like: +
+
+
+
```
{ "foo", cmd_foo, <options> },
```
+
+
+
+
```
where options is the bitwise-or of:
```
+
+
+
+RUN_SETUP +
+
+
+ Make sure there is a git directory to work on, and if there is a + work tree, chdir to the top of it if the command was invoked + in a subdirectory. If there is no work tree, no chdir() is + done. +
+
+
+USE_PAGER +
+
+
+ If the standard output is connected to a tty, spawn a pager and + feed our output to it. +
+
+
+
+
+Add builtin-foo.o to BUILTIN_OBJS in Makefile. +
+

Additionally, if foo is a new command, there are 3 more things to do:

+
+Add tests to t/ directory. +
+
+
+Write documentation in Documentation/git-foo.txt. +
+
+
+Add an entry for git-foo to the list at the end of + Documentation/cmd-list.perl. +
+

How a built-in is called

The implementation cmd_foo() takes three parameters, argc, argv, +and `prefix. The first two are similar to what main() of a +standalone command would be called with.

When RUN_SETUP is specified in the commands[] table, and when you +were started from a subdirectory of the work tree, cmd_foo() is called +after chdir(2) to the top of the work tree, and prefix gets the path +to the subdirectory the command started from. This allows you to +convert a user-supplied pathname (typically relative to that directory) +to a pathname relative to the top of the work tree.

The return value from cmd_foo() becomes the exit status of the +command.

+ + + diff --git a/technical/api-builtin.txt b/technical/api-builtin.txt new file mode 100644 index 000000000..52cdb4c52 --- /dev/null +++ b/technical/api-builtin.txt @@ -0,0 +1,63 @@ +builtin API +=========== + +Adding a new built-in +--------------------- + +There are 4 things to do to add a bulit-in command implementation to +git: + +. Define the implementation of the built-in command `foo` with + signature: + + int cmd_foo(int argc, const char **argv, const char *prefix); + +. Add the external declaration for the function to `builtin.h`. + +. Add the command to `commands[]` table in `handle_internal_command()`, + defined in `git.c`. The entry should look like: + + { "foo", cmd_foo, }, + + where options is the bitwise-or of: + +`RUN_SETUP`:: + + Make sure there is a git directory to work on, and if there is a + work tree, chdir to the top of it if the command was invoked + in a subdirectory. If there is no work tree, no chdir() is + done. + +`USE_PAGER`:: + + If the standard output is connected to a tty, spawn a pager and + feed our output to it. + +. Add `builtin-foo.o` to `BUILTIN_OBJS` in `Makefile`. + +Additionally, if `foo` is a new command, there are 3 more things to do: + +. Add tests to `t/` directory. + +. Write documentation in `Documentation/git-foo.txt`. + +. Add an entry for `git-foo` to the list at the end of + `Documentation/cmd-list.perl`. + + +How a built-in is called +------------------------ + +The implementation `cmd_foo()` takes three parameters, `argc`, `argv, +and `prefix`. The first two are similar to what `main()` of a +standalone command would be called with. + +When `RUN_SETUP` is specified in the `commands[]` table, and when you +were started from a subdirectory of the work tree, `cmd_foo()` is called +after chdir(2) to the top of the work tree, and `prefix` gets the path +to the subdirectory the command started from. This allows you to +convert a user-supplied pathname (typically relative to that directory) +to a pathname relative to the top of the work tree. + +The return value from `cmd_foo()` becomes the exit status of the +command. diff --git a/technical/api-decorate.html b/technical/api-decorate.html new file mode 100644 index 000000000..aba2a48c9 --- /dev/null +++ b/technical/api-decorate.html @@ -0,0 +1,276 @@ + + + + + + +decorate API + + + +

Talk about <decorate.h>

(Linus)

+ + + diff --git a/technical/api-decorate.txt b/technical/api-decorate.txt new file mode 100644 index 000000000..1d52a6ce1 --- /dev/null +++ b/technical/api-decorate.txt @@ -0,0 +1,6 @@ +decorate API +============ + +Talk about + +(Linus) diff --git a/technical/api-diff.html b/technical/api-diff.html new file mode 100644 index 000000000..f0d67027a --- /dev/null +++ b/technical/api-diff.html @@ -0,0 +1,589 @@ + + + + + + +diff API + + + +

The diff API is for programs that compare two sets of files (e.g. two +trees, one tree and the index) and present the found difference in +various ways. The calling program is responsible for feeding the API +pairs of files, one from the "old" set and the corresponding one from +"new" set, that are different. The library called through this API is +called diffcore, and is responsible for two things.

+
+finding total rewrites (-B), renames (-M) and copies (-C), and + changes that touch a string (-S), as specified by the caller. +
+
+
+outputting the differences in various formats, as specified by the + caller. +
+

Calling sequence

+
+Prepare struct diff_options to record the set of diff options, and + then call diff_setup() to initialize this structure. This sets up + the vanilla default. +
+
+
+Fill in the options structure to specify desired output format, rename + detection, etc. diff_opt_parse() can be used to parse options given + from the command line in a way consistent with existing git-diff + family of programs. +
+
+
+Call diff_setup_done(); this inspects the options set up so far for + internal consistency and make necessary tweaking to it (e.g. if + textual patch output was asked, recursive behaviour is turned on). +
+
+
+As you find different pairs of files, call diff_change() to feed + modified files, diff_addremove() to feed created or deleted files, + or diff_unmerged() to feed a file whose state is unmerged to the + API. These are thin wrappers to a lower-level diff_queue() function + that is flexible enough to record any of these kinds of changes. +
+
+
+Once you finish feeding the pairs of files, call diffcore_std(). + This will tell the diffcore library to go ahead and do its work. +
+
+
+Calling diffcore_flush() will produce the output. +
+

Data structures

+
+struct diff_filespec +
+

This is the internal representation for a single file (blob). It +records the blob object name (if known — for a work tree file it +typically is a NUL SHA-1), filemode and pathname. This is what the +diff_addremove(), diff_change() and diff_unmerged() synthesize and +feed diff_queue() function with.

+
+struct diff_filepair +
+

This records a pair of struct diff_filespec; the filespec for a file +in the "old" set (i.e. preimage) is called one, and the filespec for a +file in the "new" set (i.e. postimage) is called two. A change that +represents file creation has NULL in one, and file deletion has NULL +in two.

A filepair starts pointing at one and two that are from the same +filename, but diffcore_std() can break pairs and match component +filespecs with other filespecs from a different filepair to form new +filepair. This is called rename detection.

+
+struct diff_queue +
+

This is a collection of filepairs. Notable members are:

+queue +

+ An array of pointers to struct diff_filepair. This + dynamically grows as you add filepairs; +

+alloc +

+ The allocated size of the queue array; +

+nr +

+ The number of elements in the queue array. +

+
+struct diff_options +
+

This describes the set of options the calling program wants to affect +the operation of diffcore library with.

Notable members are:

+output_format +

+ The output format used when diff_flush() is run. +

+context +

+ Number of context lines to generate in patch output. +

+break_opt, detect_rename, rename-score, rename_limit +

+ Affects the way detection logic for complete rewrites, renames + and copies. +

+abbrev +

+ Number of hexdigits to abbrevate raw format output to. +

+pickaxe +

+ A constant string (can and typically does contain newlines to + look for a block of text, not just a single line) to filter out + the filepairs that do not change the number of strings contained + in its preimage and postmage of the diff_queue. +

+flags +

+ This is mostly a collection of boolean options that affects the + operation, but some do not have anything to do with the diffcore + library. +

+BINARY, TEXT +: +
+ Affects the way how a file that is seemingly binary is treated. +
+
+FULL_INDEX +: +
+ Tells the patch output format not to use abbreviated object + names on the "index" lines. +
+
+FIND_COPIES_HARDER +: +
+ Tells the diffcore library that the caller is feeding unchanged + filepairs to allow copies from unmodified files be detected. +
+
+COLOR_DIFF +: +
+ Output should be colored. +
+
+COLOR_DIFF_WORDS +: +
+ Output is a colored word-diff. +
+
+NO_INDEX +: +
+ Tells diff-files that the input is not tracked files but files + in random locations on the filesystem. +
+
+ALLOW_EXTERNAL +: +
+ Tells output routine that it is Ok to call user specified patch + output routine. Plumbing disables this to ensure stable output. +
+
+QUIET +: +
+ Do not show any output. +
+
+REVERSE_DIFF +: +
+ Tells the library that the calling program is feeding the + filepairs reversed; one is two, and two is one. +
+
+EXIT_WITH_STATUS +: +
+ For communication between the calling program and the options + parser; tell the calling program to signal the presense of + difference using program exit code. +
+
+HAS_CHANGES +: +
+ Internal; used for optimization to see if there is any change. +
+
+SILENT_ON_REMOVE +: +
+ Affects if diff-files shows removed files. +
+
+RECURSIVE, TREE_IN_RECURSIVE +: +
+ Tells if tree traversal done by tree-diff should recursively + descend into a tree object pair that are different in preimage + and postimage set. +
+

(JC)

+ + + diff --git a/technical/api-diff.txt b/technical/api-diff.txt new file mode 100644 index 000000000..822609bcd --- /dev/null +++ b/technical/api-diff.txt @@ -0,0 +1,166 @@ +diff API +======== + +The diff API is for programs that compare two sets of files (e.g. two +trees, one tree and the index) and present the found difference in +various ways. The calling program is responsible for feeding the API +pairs of files, one from the "old" set and the corresponding one from +"new" set, that are different. The library called through this API is +called diffcore, and is responsible for two things. + +* finding total rewrites (`-B`), renames (`-M`) and copies (`-C`), and + changes that touch a string (`-S`), as specified by the caller. + +* outputting the differences in various formats, as specified by the + caller. + +Calling sequence +---------------- + +* Prepare `struct diff_options` to record the set of diff options, and + then call `diff_setup()` to initialize this structure. This sets up + the vanilla default. + +* Fill in the options structure to specify desired output format, rename + detection, etc. `diff_opt_parse()` can be used to parse options given + from the command line in a way consistent with existing git-diff + family of programs. + +* Call `diff_setup_done()`; this inspects the options set up so far for + internal consistency and make necessary tweaking to it (e.g. if + textual patch output was asked, recursive behaviour is turned on). + +* As you find different pairs of files, call `diff_change()` to feed + modified files, `diff_addremove()` to feed created or deleted files, + or `diff_unmerged()` to feed a file whose state is 'unmerged' to the + API. These are thin wrappers to a lower-level `diff_queue()` function + that is flexible enough to record any of these kinds of changes. + +* Once you finish feeding the pairs of files, call `diffcore_std()`. + This will tell the diffcore library to go ahead and do its work. + +* Calling `diffcore_flush()` will produce the output. + + +Data structures +--------------- + +* `struct diff_filespec` + +This is the internal representation for a single file (blob). It +records the blob object name (if known -- for a work tree file it +typically is a NUL SHA-1), filemode and pathname. This is what the +`diff_addremove()`, `diff_change()` and `diff_unmerged()` synthesize and +feed `diff_queue()` function with. + +* `struct diff_filepair` + +This records a pair of `struct diff_filespec`; the filespec for a file +in the "old" set (i.e. preimage) is called `one`, and the filespec for a +file in the "new" set (i.e. postimage) is called `two`. A change that +represents file creation has NULL in `one`, and file deletion has NULL +in `two`. + +A `filepair` starts pointing at `one` and `two` that are from the same +filename, but `diffcore_std()` can break pairs and match component +filespecs with other filespecs from a different filepair to form new +filepair. This is called 'rename detection'. + +* `struct diff_queue` + +This is a collection of filepairs. Notable members are: + +`queue`:: + + An array of pointers to `struct diff_filepair`. This + dynamically grows as you add filepairs; + +`alloc`:: + + The allocated size of the `queue` array; + +`nr`:: + + The number of elements in the `queue` array. + + +* `struct diff_options` + +This describes the set of options the calling program wants to affect +the operation of diffcore library with. + +Notable members are: + +`output_format`:: + The output format used when `diff_flush()` is run. + +`context`:: + Number of context lines to generate in patch output. + +`break_opt`, `detect_rename`, `rename-score`, `rename_limit`:: + Affects the way detection logic for complete rewrites, renames + and copies. + +`abbrev`:: + Number of hexdigits to abbrevate raw format output to. + +`pickaxe`:: + A constant string (can and typically does contain newlines to + look for a block of text, not just a single line) to filter out + the filepairs that do not change the number of strings contained + in its preimage and postmage of the diff_queue. + +`flags`:: + This is mostly a collection of boolean options that affects the + operation, but some do not have anything to do with the diffcore + library. + +BINARY, TEXT;; + Affects the way how a file that is seemingly binary is treated. + +FULL_INDEX;; + Tells the patch output format not to use abbreviated object + names on the "index" lines. + +FIND_COPIES_HARDER;; + Tells the diffcore library that the caller is feeding unchanged + filepairs to allow copies from unmodified files be detected. + +COLOR_DIFF;; + Output should be colored. + +COLOR_DIFF_WORDS;; + Output is a colored word-diff. + +NO_INDEX;; + Tells diff-files that the input is not tracked files but files + in random locations on the filesystem. + +ALLOW_EXTERNAL;; + Tells output routine that it is Ok to call user specified patch + output routine. Plumbing disables this to ensure stable output. + +QUIET;; + Do not show any output. + +REVERSE_DIFF;; + Tells the library that the calling program is feeding the + filepairs reversed; `one` is two, and `two` is one. + +EXIT_WITH_STATUS;; + For communication between the calling program and the options + parser; tell the calling program to signal the presense of + difference using program exit code. + +HAS_CHANGES;; + Internal; used for optimization to see if there is any change. + +SILENT_ON_REMOVE;; + Affects if diff-files shows removed files. + +RECURSIVE, TREE_IN_RECURSIVE;; + Tells if tree traversal done by tree-diff should recursively + descend into a tree object pair that are different in preimage + and postimage set. + +(JC) diff --git a/technical/api-directory-listing.html b/technical/api-directory-listing.html new file mode 100644 index 000000000..80de35a95 --- /dev/null +++ b/technical/api-directory-listing.html @@ -0,0 +1,400 @@ + + + + + + +directory listing API + + + +

The directory listing API is used to enumerate paths in the work tree, +optionally taking .git/info/exclude and .gitignore files per +directory into account.

Data structure

struct dir_struct structure is used to pass directory traversal +options to the library and to record the paths discovered. The notable +options are:

+exclude_per_dir +: +
+ The name of the file to be read in each directory for excluded + files (typically .gitignore). +
+
+collect_ignored +: +
+ Include paths that are to be excluded in the result. +
+
+show_ignored +: +
+ The traversal is for finding just ignored files, not unignored + files. +
+
+show_other_directories +: +
+ Include a directory that is not tracked. +
+
+hide_empty_directories +: +
+ Do not include a directory that is not tracked and is empty. +
+
+no_gitlinks +: +
+ If set, recurse into a directory that looks like a git + directory. Otherwise it is shown as a directory. +
+
+The result of the enumeration is left in these fields +
+entries[] +: +
+ An array of struct dir_entry, each element of which describes + a path. +
+
+nr +: +
+ The number of members in entries[] array. +
+
+alloc +: +
+ Internal use; keeps track of allocation of entries[] array. +
+

Calling sequence

+
+Prepare struct dir_struct dir and clear it with memset(&dir, 0, + sizeof(dir)). +
+
+
+Call add_exclude() to add single exclude pattern, + add_excludes_from_file() to add patterns from a file + (e.g. .git/info/exclude), and/or set dir.exclude_per_dir. A + short-hand function setup_standard_excludes() can be used to set up + the standard set of exclude settings. +
+
+
+Set options described in the Data Structure section above. +
+
+
+Call read_directory(). +
+
+
+Use dir.entries[]. +
+

(JC)

+ + + diff --git a/technical/api-directory-listing.txt b/technical/api-directory-listing.txt new file mode 100644 index 000000000..5bbd18f02 --- /dev/null +++ b/technical/api-directory-listing.txt @@ -0,0 +1,76 @@ +directory listing API +===================== + +The directory listing API is used to enumerate paths in the work tree, +optionally taking `.git/info/exclude` and `.gitignore` files per +directory into account. + +Data structure +-------------- + +`struct dir_struct` structure is used to pass directory traversal +options to the library and to record the paths discovered. The notable +options are: + +`exclude_per_dir`:: + + The name of the file to be read in each directory for excluded + files (typically `.gitignore`). + +`collect_ignored`:: + + Include paths that are to be excluded in the result. + +`show_ignored`:: + + The traversal is for finding just ignored files, not unignored + files. + +`show_other_directories`:: + + Include a directory that is not tracked. + +`hide_empty_directories`:: + + Do not include a directory that is not tracked and is empty. + +`no_gitlinks`:: + + If set, recurse into a directory that looks like a git + directory. Otherwise it is shown as a directory. + +The result of the enumeration is left in these fields:: + +`entries[]`:: + + An array of `struct dir_entry`, each element of which describes + a path. + +`nr`:: + + The number of members in `entries[]` array. + +`alloc`:: + + Internal use; keeps track of allocation of `entries[]` array. + + +Calling sequence +---------------- + +* Prepare `struct dir_struct dir` and clear it with `memset(&dir, 0, + sizeof(dir))`. + +* Call `add_exclude()` to add single exclude pattern, + `add_excludes_from_file()` to add patterns from a file + (e.g. `.git/info/exclude`), and/or set `dir.exclude_per_dir`. A + short-hand function `setup_standard_excludes()` can be used to set up + the standard set of exclude settings. + +* Set options described in the Data Structure section above. + +* Call `read_directory()`. + +* Use `dir.entries[]`. + +(JC) diff --git a/technical/api-gitattributes.html b/technical/api-gitattributes.html new file mode 100644 index 000000000..d453853f9 --- /dev/null +++ b/technical/api-gitattributes.html @@ -0,0 +1,424 @@ + + + + + + +gitattributes API + + + +

gitattributes mechanism gives a uniform way to associate various +attributes to set of paths.

Data Structure

+struct git_attr +: +
+ An attribute is an opaque object that is identified by its name. + Pass the name and its length to git_attr() function to obtain + the object of this type. The internal representation of this + structure is of no interest to the calling programs. +
+
+struct git_attr_check +: +
+ This structure represents a set of attributes to check in a call + to git_checkattr() function, and receives the results. +
+

Calling Sequence

+
+Prepare an array of struct git_attr_check to define the list of + attributes you would want to check. To populate this array, you would + need to define necessary attributes by calling git_attr() function. +
+
+
+Call git_checkattr() to check the attributes for the path. +
+
+
+Inspect git_attr_check structure to see how each of the attribute in + the array is defined for the path. +
+

Attribute Values

An attribute for a path can be in one of four states: Set, Unset, +Unspecified or set to a string, and .value member of struct +git_attr_check records it. There are three macros to check these:

+ATTR_TRUE() +: +
+ Returns true if the attribute is Set for the path. +
+
+ATTR_FALSE() +: +
+ Returns true if the attribute is Unset for the path. +
+
+ATTR_UNSET() +: +
+ Returns true if the attribute is Unspecified for the path. +
+

If none of the above returns true, .value member points at a string +value of the attribute for the path.

Example

To see how attributes "crlf" and "indent" are set for different paths.

+
+Prepare an array of struct git_attr_check with two elements (because + we are checking two attributes). Initialize their attr member with + pointers to struct git_attr obtained by calling git_attr(): +
+

static struct git_attr_check check[2];
+static void setup_check(void)
+{
+        if (check[0].attr)
+                return; /* already done */
+        check[0].attr = git_attr("crlf", 4);
+        check[1].attr = git_attr("ident", 5);
+}

+
+Call git_checkattr() with the prepared array of struct git_attr_check: +
+

        const char *path;
+
+        setup_check();
+        git_checkattr(path, ARRAY_SIZE(check), check);

+
+Act on .value member of the result, left in check[]: +
+

        const char *value = check[0].value;
+
+        if (ATTR_TRUE(value)) {
+                The attribute is Set, by listing only the name of the
+                attribute in the gitattributes file for the path.
+        } else if (ATTR_FALSE(value)) {
+                The attribute is Unset, by listing the name of the
+                attribute prefixed with a dash - for the path.
+        } else if (ATTR_UNSET(value)) {
+                The attribute is not set nor unset for the path.
+        } else if (!strcmp(value, "input")) {
+                If none of ATTR_TRUE(), ATTR_FALSE(), or ATTR_UNSET() is
+                true, the value is a string set in the gitattributes
+                file for the path by saying "attr=value".
+        } else if (... other check using value as string ...) {
+                ...
+        }

(JC)

+ + + diff --git a/technical/api-gitattributes.txt b/technical/api-gitattributes.txt new file mode 100644 index 000000000..9d97eaa9d --- /dev/null +++ b/technical/api-gitattributes.txt @@ -0,0 +1,111 @@ +gitattributes API +================= + +gitattributes mechanism gives a uniform way to associate various +attributes to set of paths. + + +Data Structure +-------------- + +`struct git_attr`:: + + An attribute is an opaque object that is identified by its name. + Pass the name and its length to `git_attr()` function to obtain + the object of this type. The internal representation of this + structure is of no interest to the calling programs. + +`struct git_attr_check`:: + + This structure represents a set of attributes to check in a call + to `git_checkattr()` function, and receives the results. + + +Calling Sequence +---------------- + +* Prepare an array of `struct git_attr_check` to define the list of + attributes you would want to check. To populate this array, you would + need to define necessary attributes by calling `git_attr()` function. + +* Call git_checkattr() to check the attributes for the path. + +* Inspect `git_attr_check` structure to see how each of the attribute in + the array is defined for the path. + + +Attribute Values +---------------- + +An attribute for a path can be in one of four states: Set, Unset, +Unspecified or set to a string, and `.value` member of `struct +git_attr_check` records it. There are three macros to check these: + +`ATTR_TRUE()`:: + + Returns true if the attribute is Set for the path. + +`ATTR_FALSE()`:: + + Returns true if the attribute is Unset for the path. + +`ATTR_UNSET()`:: + + Returns true if the attribute is Unspecified for the path. + +If none of the above returns true, `.value` member points at a string +value of the attribute for the path. + + +Example +------- + +To see how attributes "crlf" and "indent" are set for different paths. + +. Prepare an array of `struct git_attr_check` with two elements (because + we are checking two attributes). Initialize their `attr` member with + pointers to `struct git_attr` obtained by calling `git_attr()`: + +------------ +static struct git_attr_check check[2]; +static void setup_check(void) +{ + if (check[0].attr) + return; /* already done */ + check[0].attr = git_attr("crlf", 4); + check[1].attr = git_attr("ident", 5); +} +------------ + +. Call `git_checkattr()` with the prepared array of `struct git_attr_check`: + +------------ + const char *path; + + setup_check(); + git_checkattr(path, ARRAY_SIZE(check), check); +------------ + +. Act on `.value` member of the result, left in `check[]`: + +------------ + const char *value = check[0].value; + + if (ATTR_TRUE(value)) { + The attribute is Set, by listing only the name of the + attribute in the gitattributes file for the path. + } else if (ATTR_FALSE(value)) { + The attribute is Unset, by listing the name of the + attribute prefixed with a dash - for the path. + } else if (ATTR_UNSET(value)) { + The attribute is not set nor unset for the path. + } else if (!strcmp(value, "input")) { + If none of ATTR_TRUE(), ATTR_FALSE(), or ATTR_UNSET() is + true, the value is a string set in the gitattributes + file for the path by saying "attr=value". + } else if (... other check using value as string ...) { + ... + } +------------ + +(JC) diff --git a/technical/api-grep.html b/technical/api-grep.html new file mode 100644 index 000000000..533f095b4 --- /dev/null +++ b/technical/api-grep.html @@ -0,0 +1,283 @@ + + + + + + +grep API + + + +

Talk about <grep.h>, things like:

+
+grep_buffer() +
+

(JC)

+ + + diff --git a/technical/api-grep.txt b/technical/api-grep.txt new file mode 100644 index 000000000..a69cc8964 --- /dev/null +++ b/technical/api-grep.txt @@ -0,0 +1,8 @@ +grep API +======== + +Talk about , things like: + +* grep_buffer() + +(JC) diff --git a/technical/api-hash.html b/technical/api-hash.html new file mode 100644 index 000000000..ec52689ac --- /dev/null +++ b/technical/api-hash.html @@ -0,0 +1,276 @@ + + + + + + +hash API + + + +

Talk about <hash.h>

(Linus)

+ + + diff --git a/technical/api-hash.txt b/technical/api-hash.txt new file mode 100644 index 000000000..c784d3edc --- /dev/null +++ b/technical/api-hash.txt @@ -0,0 +1,6 @@ +hash API +======== + +Talk about + +(Linus) diff --git a/technical/api-in-core-index.html b/technical/api-in-core-index.html new file mode 100644 index 000000000..7770538c6 --- /dev/null +++ b/technical/api-in-core-index.html @@ -0,0 +1,344 @@ + + + + + + +in-core index API + + + +

Talk about <read-cache.c> and <cache-tree.c>, things like:

+
+cache -> the_index macros +
+
+
+read_index() +
+
+
+write_index() +
+
+
+ie_match_stat() and ie_modified(); how they are different and when to + use which. +
+
+
+index_name_pos() +
+
+
+remove_index_entry_at() +
+
+
+remove_file_from_index() +
+
+
+add_file_to_index() +
+
+
+add_index_entry() +
+
+
+refresh_index() +
+
+
+discard_index() +
+
+
+cache_tree_invalidate_path() +
+
+
+cache_tree_update() +
+

(JC, Linus)

+ + + diff --git a/technical/api-in-core-index.txt b/technical/api-in-core-index.txt new file mode 100644 index 000000000..adbdbf5d7 --- /dev/null +++ b/technical/api-in-core-index.txt @@ -0,0 +1,21 @@ +in-core index API +================= + +Talk about and , things like: + +* cache -> the_index macros +* read_index() +* write_index() +* ie_match_stat() and ie_modified(); how they are different and when to + use which. +* index_name_pos() +* remove_index_entry_at() +* remove_file_from_index() +* add_file_to_index() +* add_index_entry() +* refresh_index() +* discard_index() +* cache_tree_invalidate_path() +* cache_tree_update() + +(JC, Linus) diff --git a/technical/api-index-skel.txt b/technical/api-index-skel.txt new file mode 100644 index 000000000..af7cc2e39 --- /dev/null +++ b/technical/api-index-skel.txt @@ -0,0 +1,15 @@ +GIT API Documents +================= + +GIT has grown a set of internal API over time. This collection +documents them. + +//////////////////////////////////////////////////////////////// +// table of contents begin +//////////////////////////////////////////////////////////////// + +//////////////////////////////////////////////////////////////// +// table of contents end +//////////////////////////////////////////////////////////////// + +2007-11-24 diff --git a/technical/api-index.html b/technical/api-index.html new file mode 100644 index 000000000..e9cf63d20 --- /dev/null +++ b/technical/api-index.html @@ -0,0 +1,379 @@ + + + + + + +GIT API Documents + + + +

GIT has grown a set of internal API over time. This collection +documents them.

+
+allocation growing API +
+
+
+builtin API +
+
+
+decorate API +
+
+
+diff API +
+
+
+directory listing API +
+
+
+gitattributes API +
+
+
+grep API +
+
+
+hash API +
+
+
+in-core index API +
+
+
+lockfile API +
+
+
+object access API +
+
+
+parse-options API +
+
+
+path-list API +
+
+
+quote API +
+
+
+revision walking API +
+
+
+run-command API +
+
+
+setup API +
+
+
+strbuf API +
+
+
+tree walking API +
+
+
+xdiff interface API +
+

2007-11-24

+ + + diff --git a/technical/api-index.txt b/technical/api-index.txt new file mode 100644 index 000000000..bc9c190a9 --- /dev/null +++ b/technical/api-index.txt @@ -0,0 +1,34 @@ +GIT API Documents +================= + +GIT has grown a set of internal API over time. This collection +documents them. + +//////////////////////////////////////////////////////////////// +// table of contents begin +//////////////////////////////////////////////////////////////// +* link:api-allocation-growing.html[allocation growing API] +* link:api-builtin.html[builtin API] +* link:api-decorate.html[decorate API] +* link:api-diff.html[diff API] +* link:api-directory-listing.html[directory listing API] +* link:api-gitattributes.html[gitattributes API] +* link:api-grep.html[grep API] +* link:api-hash.html[hash API] +* link:api-in-core-index.html[in-core index API] +* link:api-lockfile.html[lockfile API] +* link:api-object-access.html[object access API] +* link:api-parse-options.html[parse-options API] +* link:api-path-list.html[path-list API] +* link:api-quote.html[quote API] +* link:api-revision-walking.html[revision walking API] +* link:api-run-command.html[run-command API] +* link:api-setup.html[setup API] +* link:api-strbuf.html[strbuf API] +* link:api-tree-walking.html[tree walking API] +* link:api-xdiff-interface.html[xdiff interface API] +//////////////////////////////////////////////////////////////// +// table of contents end +//////////////////////////////////////////////////////////////// + +2007-11-24 diff --git a/technical/api-lockfile.html b/technical/api-lockfile.html new file mode 100644 index 000000000..2a0bc1a15 --- /dev/null +++ b/technical/api-lockfile.html @@ -0,0 +1,299 @@ + + + + + + +lockfile API + + + +

Talk about <lockfile.c>, things like:

+
+lockfile lifetime — atexit(3) looks at them, do not put them on the + stack; +
+
+
+hold_lock_file_for_update() +
+
+
+commit_lock_file() +
+
+
+rollback_rock_file() +
+

(JC, Dscho, Shawn)

+ + + diff --git a/technical/api-lockfile.txt b/technical/api-lockfile.txt new file mode 100644 index 000000000..73ac1025f --- /dev/null +++ b/technical/api-lockfile.txt @@ -0,0 +1,12 @@ +lockfile API +============ + +Talk about , things like: + +* lockfile lifetime -- atexit(3) looks at them, do not put them on the + stack; +* hold_lock_file_for_update() +* commit_lock_file() +* rollback_rock_file() + +(JC, Dscho, Shawn) diff --git a/technical/api-object-access.html b/technical/api-object-access.html new file mode 100644 index 000000000..207ae6fd7 --- /dev/null +++ b/technical/api-object-access.html @@ -0,0 +1,318 @@ + + + + + + +object access API + + + +

Talk about <sha1_file.c> and <object.h> family, things like

+
+read_sha1_file() +
+
+
+read_object_with_reference() +
+
+
+has_sha1_file() +
+
+
+write_sha1_file() +
+
+
+pretend_sha1_file() +
+
+
+lookup_{object,commit,tag,blob,tree} +
+
+
+parse_{object,commit,tag,blob,tree} +
+
+
+Use of object flags +
+

(JC, Shawn, Daniel, Dscho, Linus)

+ + + diff --git a/technical/api-object-access.txt b/technical/api-object-access.txt new file mode 100644 index 000000000..03bb0e950 --- /dev/null +++ b/technical/api-object-access.txt @@ -0,0 +1,15 @@ +object access API +================= + +Talk about and family, things like + +* read_sha1_file() +* read_object_with_reference() +* has_sha1_file() +* write_sha1_file() +* pretend_sha1_file() +* lookup_{object,commit,tag,blob,tree} +* parse_{object,commit,tag,blob,tree} +* Use of object flags + +(JC, Shawn, Daniel, Dscho, Linus) diff --git a/technical/api-parse-options.html b/technical/api-parse-options.html new file mode 100644 index 000000000..70ca90aaf --- /dev/null +++ b/technical/api-parse-options.html @@ -0,0 +1,276 @@ + + + + + + +parse-options API + + + +

Talk about <parse-options.h>

(Pierre)

+ + + diff --git a/technical/api-parse-options.txt b/technical/api-parse-options.txt new file mode 100644 index 000000000..b7cda94f5 --- /dev/null +++ b/technical/api-parse-options.txt @@ -0,0 +1,6 @@ +parse-options API +================= + +Talk about + +(Pierre) diff --git a/technical/api-path-list.html b/technical/api-path-list.html new file mode 100644 index 000000000..9c865e4f2 --- /dev/null +++ b/technical/api-path-list.html @@ -0,0 +1,288 @@ + + + + + + +path-list API + + + +

Talk about <path-list.h>, things like

+
+it is not just paths but strings in general; +
+
+
+the calling sequence. +
+

(Dscho)

+ + + diff --git a/technical/api-path-list.txt b/technical/api-path-list.txt new file mode 100644 index 000000000..d07768317 --- /dev/null +++ b/technical/api-path-list.txt @@ -0,0 +1,9 @@ +path-list API +============= + +Talk about , things like + +* it is not just paths but strings in general; +* the calling sequence. + +(Dscho) diff --git a/technical/api-quote.html b/technical/api-quote.html new file mode 100644 index 000000000..104ff5a17 --- /dev/null +++ b/technical/api-quote.html @@ -0,0 +1,293 @@ + + + + + + +quote API + + + +

Talk about <quote.h>, things like

+
+sq_quote and unquote +
+
+
+c_style quote and unquote +
+
+
+quoting for foreign languages +
+

(JC)

+ + + diff --git a/technical/api-quote.txt b/technical/api-quote.txt new file mode 100644 index 000000000..e8a1bce94 --- /dev/null +++ b/technical/api-quote.txt @@ -0,0 +1,10 @@ +quote API +========= + +Talk about , things like + +* sq_quote and unquote +* c_style quote and unquote +* quoting for foreign languages + +(JC) diff --git a/technical/api-revision-walking.html b/technical/api-revision-walking.html new file mode 100644 index 000000000..db126508e --- /dev/null +++ b/technical/api-revision-walking.html @@ -0,0 +1,288 @@ + + + + + + +revision walking API + + + +

Talk about <revision.h>, things like:

+
+two diff_options, one for path limiting, another for output; +
+
+
+calling sequence: init_revisions(), setup_revsions(), get_revision(); +
+

(Linus, JC, Dscho)

+ + + diff --git a/technical/api-revision-walking.txt b/technical/api-revision-walking.txt new file mode 100644 index 000000000..01a24551a --- /dev/null +++ b/technical/api-revision-walking.txt @@ -0,0 +1,9 @@ +revision walking API +==================== + +Talk about , things like: + +* two diff_options, one for path limiting, another for output; +* calling sequence: init_revisions(), setup_revsions(), get_revision(); + +(Linus, JC, Dscho) diff --git a/technical/api-run-command.html b/technical/api-run-command.html new file mode 100644 index 000000000..00b10fb4b --- /dev/null +++ b/technical/api-run-command.html @@ -0,0 +1,293 @@ + + + + + + +run-command API + + + +

Talk about <run-command.h>, and things like:

+
+Environment the command runs with (e.g. GIT_DIR); +
+
+
+File descriptors and pipes; +
+
+
+Exit status; +
+

(Hannes, Dscho, Shawn)

+ + + diff --git a/technical/api-run-command.txt b/technical/api-run-command.txt new file mode 100644 index 000000000..19d2f64f7 --- /dev/null +++ b/technical/api-run-command.txt @@ -0,0 +1,10 @@ +run-command API +=============== + +Talk about , and things like: + +* Environment the command runs with (e.g. GIT_DIR); +* File descriptors and pipes; +* Exit status; + +(Hannes, Dscho, Shawn) diff --git a/technical/api-setup.html b/technical/api-setup.html new file mode 100644 index 000000000..38b50838f --- /dev/null +++ b/technical/api-setup.html @@ -0,0 +1,308 @@ + + + + + + +setup API + + + +

Talk about

+
+setup_git_directory() +
+
+
+setup_git_directory_gently() +
+
+
+is_inside_git_dir() +
+
+
+is_inside_work_tree() +
+
+
+setup_work_tree() +
+
+
+get_pathspec() +
+

(Dscho)

+ + + diff --git a/technical/api-setup.txt b/technical/api-setup.txt new file mode 100644 index 000000000..4f63a04d7 --- /dev/null +++ b/technical/api-setup.txt @@ -0,0 +1,13 @@ +setup API +========= + +Talk about + +* setup_git_directory() +* setup_git_directory_gently() +* is_inside_git_dir() +* is_inside_work_tree() +* setup_work_tree() +* get_pathspec() + +(Dscho) diff --git a/technical/api-strbuf.html b/technical/api-strbuf.html new file mode 100644 index 000000000..e75fba926 --- /dev/null +++ b/technical/api-strbuf.html @@ -0,0 +1,276 @@ + + + + + + +strbuf API + + + +

Talk about <strbuf.h>

(Pierre, JC)

+ + + diff --git a/technical/api-strbuf.txt b/technical/api-strbuf.txt new file mode 100644 index 000000000..a52e4f36d --- /dev/null +++ b/technical/api-strbuf.txt @@ -0,0 +1,6 @@ +strbuf API +========== + +Talk about + +(Pierre, JC) diff --git a/technical/api-tree-walking.html b/technical/api-tree-walking.html new file mode 100644 index 000000000..e541585bc --- /dev/null +++ b/technical/api-tree-walking.html @@ -0,0 +1,303 @@ + + + + + + +tree walking API + + + +

Talk about <tree-walk.h>, things like

+
+struct tree_desc +
+
+
+init_tree_desc +
+
+
+tree_entry_extract +
+
+
+update_tree_entry +
+
+
+get_tree_entry +
+

(JC, Linus)

+ + + diff --git a/technical/api-tree-walking.txt b/technical/api-tree-walking.txt new file mode 100644 index 000000000..e3ddf9128 --- /dev/null +++ b/technical/api-tree-walking.txt @@ -0,0 +1,12 @@ +tree walking API +================ + +Talk about , things like + +* struct tree_desc +* init_tree_desc +* tree_entry_extract +* update_tree_entry +* get_tree_entry + +(JC, Linus) diff --git a/technical/api-xdiff-interface.html b/technical/api-xdiff-interface.html new file mode 100644 index 000000000..63aced77e --- /dev/null +++ b/technical/api-xdiff-interface.html @@ -0,0 +1,277 @@ + + + + + + +xdiff interface API + + + +

Talk about our calling convention to xdiff library, including +xdiff_emit_consume_fn.

(Dscho, JC)

+ + + diff --git a/technical/api-xdiff-interface.txt b/technical/api-xdiff-interface.txt new file mode 100644 index 000000000..6296ecad1 --- /dev/null +++ b/technical/api-xdiff-interface.txt @@ -0,0 +1,7 @@ +xdiff interface API +=================== + +Talk about our calling convention to xdiff library, including +xdiff_emit_consume_fn. + +(Dscho, JC) diff --git a/technical/pack-format.txt b/technical/pack-format.txt new file mode 100644 index 000000000..a80baa438 --- /dev/null +++ b/technical/pack-format.txt @@ -0,0 +1,146 @@ +GIT pack format +=============== + += pack-*.pack files have the following format: + + - A header appears at the beginning and consists of the following: + + 4-byte signature: + The signature is: {'P', 'A', 'C', 'K'} + + 4-byte version number (network byte order): + GIT currently accepts version number 2 or 3 but + generates version 2 only. + + 4-byte number of objects contained in the pack (network byte order) + + Observation: we cannot have more than 4G versions ;-) and + more than 4G objects in a pack. + + - The header is followed by number of object entries, each of + which looks like this: + + (undeltified representation) + n-byte type and length (3-bit type, (n-1)*7+4-bit length) + compressed data + + (deltified representation) + n-byte type and length (3-bit type, (n-1)*7+4-bit length) + 20-byte base object name + compressed delta data + + Observation: length of each object is encoded in a variable + length format and is not constrained to 32-bit or anything. + + - The trailer records 20-byte SHA1 checksum of all of the above. + += Original (version 1) pack-*.idx files have the following format: + + - The header consists of 256 4-byte network byte order + integers. N-th entry of this table records the number of + objects in the corresponding pack, the first byte of whose + object name is less than or equal to N. This is called the + 'first-level fan-out' table. + + - The header is followed by sorted 24-byte entries, one entry + per object in the pack. Each entry is: + + 4-byte network byte order integer, recording where the + object is stored in the packfile as the offset from the + beginning. + + 20-byte object name. + + - The file is concluded with a trailer: + + A copy of the 20-byte SHA1 checksum at the end of + corresponding packfile. + + 20-byte SHA1-checksum of all of the above. + +Pack Idx file: + + -- +--------------------------------+ +fanout | fanout[0] = 2 (for example) |-. +table +--------------------------------+ | + | fanout[1] | | + +--------------------------------+ | + | fanout[2] | | + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | + | fanout[255] = total objects |---. + -- +--------------------------------+ | | +main | offset | | | +index | object name 00XXXXXXXXXXXXXXXX | | | +table +--------------------------------+ | | + | offset | | | + | object name 00XXXXXXXXXXXXXXXX | | | + +--------------------------------+<+ | + .-| offset | | + | | object name 01XXXXXXXXXXXXXXXX | | + | +--------------------------------+ | + | | offset | | + | | object name 01XXXXXXXXXXXXXXXX | | + | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | + | | offset | | + | | object name FFXXXXXXXXXXXXXXXX | | + --| +--------------------------------+<--+ +trailer | | packfile checksum | + | +--------------------------------+ + | | idxfile checksum | + | +--------------------------------+ + .-------. + | +Pack file entry: <+ + + packed object header: + 1-byte size extension bit (MSB) + type (next 3 bit) + size0 (lower 4-bit) + n-byte sizeN (as long as MSB is set, each 7-bit) + size0..sizeN form 4+7+7+..+7 bit integer, size0 + is the least significant part, and sizeN is the + most significant part. + packed object data: + If it is not DELTA, then deflated bytes (the size above + is the size before compression). + If it is DELTA, then + 20-byte base object name SHA1 (the size above is the + size of the delta data that follows). + delta data, deflated. + + += Version 2 pack-*.idx files support packs larger than 4 GiB, and + have some other reorganizations. They have the format: + + - A 4-byte magic number '\377tOc' which is an unreasonable + fanout[0] value. + + - A 4-byte version number (= 2) + + - A 256-entry fan-out table just like v1. + + - A table of sorted 20-byte SHA1 object names. These are + packed together without offset values to reduce the cache + footprint of the binary search for a specific object name. + + - A table of 4-byte CRC32 values of the packed object data. + This is new in v2 so compressed data can be copied directly + from pack to pack during repacking withough undetected + data corruption. + + - A table of 4-byte offset values (in network byte order). + These are usually 31-bit pack file offsets, but large + offsets are encoded as an index into the next table with + the msbit set. + + - A table of 8-byte offset entries (empty for pack files less + than 2 GiB). Pack files are organized with heavily used + objects toward the front, so most object references should + not need to refer to this table. + + - The same trailer as a v1 pack file: + + A copy of the 20-byte SHA1 checksum at the end of + corresponding packfile. + + 20-byte SHA1-checksum of all of the above. diff --git a/technical/pack-heuristics.txt b/technical/pack-heuristics.txt new file mode 100644 index 000000000..103eb5d98 --- /dev/null +++ b/technical/pack-heuristics.txt @@ -0,0 +1,466 @@ + Concerning Git's Packing Heuristics + =================================== + + Oh, here's a really stupid question: + + Where do I go + to learn the details + of git's packing heuristics? + +Be careful what you ask! + +Followers of the git, please open the git IRC Log and turn to +February 10, 2006. + +It's a rare occasion, and we are joined by the King Git Himself, +Linus Torvalds (linus). Nathaniel Smith, (njs`), has the floor +and seeks enlightenment. Others are present, but silent. + +Let's listen in! + + Oh, here's a really stupid question -- where do I go to + learn the details of git's packing heuristics? google avails + me not, reading the source didn't help a lot, and wading + through the whole mailing list seems less efficient than any + of that. + +It is a bold start! A plea for help combined with a simultaneous +tri-part attack on some of the tried and true mainstays in the quest +for enlightenment. Brash accusations of google being useless. Hubris! +Maligning the source. Heresy! Disdain for the mailing list archives. +Woe. + + yes, the packing-related delta stuff is somewhat + mysterious even for me ;) + +Ah! Modesty after all. + + njs, I don't think the docs exist. That's something where + I don't think anybody else than me even really got involved. + Most of the rest of git others have been busy with (especially + Junio), but packing nobody touched after I did it. + +It's cryptic, yet vague. Linus in style for sure. Wise men +interpret this as an apology. A few argue it is merely a +statement of fact. + + I guess the next step is "read the source again", but I + have to build up a certain level of gumption first :-) + +Indeed! On both points. + + The packing heuristic is actually really really simple. + +Bait... + + But strange. + +And switch. That ought to do it! + + Remember: git really doesn't follow files. So what it does is + - generate a list of all objects + - sort the list according to magic heuristics + - walk the list, using a sliding window, seeing if an object + can be diffed against another object in the window + - write out the list in recency order + +The traditional understatement: + + I suspect that what I'm missing is the precise definition of + the word "magic" + +The traditional insight: + + yes + +And Babel-like confusion flowed. + + oh, hmm, and I'm not sure what this sliding window means either + + iirc, it appeared to me to be just the sha1 of the object + when reading the code casually ... + + ... which simply doesn't sound as a very good heuristics, though ;) + + .....and recency order. okay, I think it's clear I didn't + even realize how much I wasn't realizing :-) + +Ah, grasshopper! And thus the enlightenment begins anew. + + The "magic" is actually in theory totally arbitrary. + ANY order will give you a working pack, but no, it's not + ordered by SHA1. + + Before talking about the ordering for the sliding delta + window, let's talk about the recency order. That's more + important in one way. + + Right, but if all you want is a working way to pack things + together, you could just use cat and save yourself some + trouble... + +Waaait for it.... + + The recency ordering (which is basically: put objects + _physically_ into the pack in the order that they are + "reachable" from the head) is important. + + okay + + It's important because that's the thing that gives packs + good locality. It keeps the objects close to the head (whether + they are old or new, but they are _reachable_ from the head) + at the head of the pack. So packs actually have absolutely + _wonderful_ IO patterns. + +Read that again, because it is important. + + But recency ordering is totally useless for deciding how + to actually generate the deltas, so the delta ordering is + something else. + + The delta ordering is (wait for it): + - first sort by the "basename" of the object, as defined by + the name the object was _first_ reached through when + generating the object list + - within the same basename, sort by size of the object + - but always sort different types separately (commits first). + + That's not exactly it, but it's very close. + + The "_first_ reached" thing is not too important, just you + need some way to break ties since the same objects may be + reachable many ways, yes? + +And as if to clarify: + + The point is that it's all really just any random + heuristic, and the ordering is totally unimportant for + correctness, but it helps a lot if the heuristic gives + "clumping" for things that are likely to delta well against + each other. + +It is an important point, so secretly, I did my own research and have +included my results below. To be fair, it has changed some over time. +And through the magic of Revisionistic History, I draw upon this entry +from The Git IRC Logs on my father's birthday, March 1: + + The quote from the above linus should be rewritten a + bit (wait for it): + - first sort by type. Different objects never delta with + each other. + - then sort by filename/dirname. hash of the basename + occupies the top BITS_PER_INT-DIR_BITS bits, and bottom + DIR_BITS are for the hash of leading path elements. + - then if we are doing "thin" pack, the objects we are _not_ + going to pack but we know about are sorted earlier than + other objects. + - and finally sort by size, larger to smaller. + +In one swell-foop, clarification and obscurification! Nonetheless, +authoritative. Cryptic, yet concise. It even solicits notions of +quotes from The Source Code. Clearly, more study is needed. + + That's the sort order. What this means is: + - we do not delta different object types. + - we prefer to delta the objects with the same full path, but + allow files with the same name from different directories. + - we always prefer to delta against objects we are not going + to send, if there are some. + - we prefer to delta against larger objects, so that we have + lots of removals. + + The penultimate rule is for "thin" packs. It is used when + the other side is known to have such objects. + +There it is again. "Thin" packs. I'm thinking to myself, "What +is a 'thin' pack?" So I ask: + + What is a "thin" pack? + + Use of --objects-edge to rev-list as the upstream of + pack-objects. The pack transfer protocol negotiates that. + +Woo hoo! Cleared that _right_ up! + + There are two directions - push and fetch. + +There! Did you see it? It is not '"push" and "pull"'! How often the +confusion has started here. So casually mentioned, too! + + For push, git-send-pack invokes git-receive-pack on the + other end. The receive-pack says "I have up to these commits". + send-pack looks at them, and computes what are missing from + the other end. So "thin" could be the default there. + + In the other direction, fetch, git-fetch-pack and + git-clone-pack invokes git-upload-pack on the other end + (via ssh or by talking to the daemon). + + There are two cases: fetch-pack with -k and clone-pack is one, + fetch-pack without -k is the other. clone-pack and fetch-pack + with -k will keep the downloaded packfile without expanded, so + we do not use thin pack transfer. Otherwise, the generated + pack will have delta without base object in the same pack. + + But fetch-pack without -k will explode the received pack into + individual objects, so we automatically ask upload-pack to + give us a thin pack if upload-pack supports it. + +OK then. + +Uh. + +Let's return to the previous conversation still in progress. + + and "basename" means something like "the tail of end of + path of file objects and dir objects, as per basename(3), and + we just declare all commit and tag objects to have the same + basename" or something? + +Luckily, that too is a point that gitster clarified for us! + +If I might add, the trick is to make files that _might_ be similar be +located close to each other in the hash buckets based on their file +names. It used to be that "foo/Makefile", "bar/baz/quux/Makefile" and +"Makefile" all landed in the same bucket due to their common basename, +"Makefile". However, now they land in "close" buckets. + +The algorithm allows not just for the _same_ bucket, but for _close_ +buckets to be considered delta candidates. The rationale is +essentially that files, like Makefiles, often have very similar +content no matter what directory they live in. + + I played around with different delta algorithms, and with + making the "delta window" bigger, but having too big of a + sliding window makes it very expensive to generate the pack: + you need to compare every object with a _ton_ of other objects. + + There are a number of other trivial heuristics too, which + basically boil down to "don't bother even trying to delta this + pair" if we can tell before-hand that the delta isn't worth it + (due to size differences, where we can take a previous delta + result into account to decide that "ok, no point in trying + that one, it will be worse"). + + End result: packing is actually very size efficient. It's + somewhat CPU-wasteful, but on the other hand, since you're + really only supposed to do it maybe once a month (and you can + do it during the night), nobody really seems to care. + +Nice Engineering Touch, there. Find when it doesn't matter, and +proclaim it a non-issue. Good style too! + + So, just to repeat to see if I'm following, we start by + getting a list of the objects we want to pack, we sort it by + this heuristic (basically lexicographically on the tuple + (type, basename, size)). + + Then we walk through this list, and calculate a delta of + each object against the last n (tunable parameter) objects, + and pick the smallest of these deltas. + +Vastly simplified, but the essence is there! + + Correct. + + And then once we have picked a delta or fulltext to + represent each object, we re-sort by recency, and write them + out in that order. + + Yup. Some other small details: + +And of course there is the "Other Shoe" Factor too. + + - We limit the delta depth to another magic value (right + now both the window and delta depth magic values are just "10") + + Hrm, my intuition is that you'd end up with really _bad_ IO + patterns, because the things you want are near by, but to + actually reconstruct them you may have to jump all over in + random ways. + + - When we write out a delta, and we haven't yet written + out the object it is a delta against, we write out the base + object first. And no, when we reconstruct them, we actually + get nice IO patterns, because: + - larger objects tend to be "more recent" (Linus' law: files grow) + - we actively try to generate deltas from a larger object to a + smaller one + - this means that the top-of-tree very seldom has deltas + (i.e. deltas in _practice_ are "backwards deltas") + +Again, we should reread that whole paragraph. Not just because +Linus has slipped Linus's Law in there on us, but because it is +important. Let's make sure we clarify some of the points here: + + So the point is just that in practice, delta order and + recency order match each other quite well. + + Yes. There's another nice side to this (and yes, it was + designed that way ;): + - the reason we generate deltas against the larger object is + actually a big space saver too! + + Hmm, but your last comment (if "we haven't yet written out + the object it is a delta against, we write out the base object + first"), seems like it would make these facts mostly + irrelevant because even if in practice you would not have to + wander around much, in fact you just brute-force say that in + the cases where you might have to wander, don't do that :-) + + Yes and no. Notice the rule: we only write out the base + object first if the delta against it was more recent. That + means that you can actually have deltas that refer to a base + object that is _not_ close to the delta object, but that only + happens when the delta is needed to generate an _old_ object. + + See? + +Yeah, no. I missed that on the first two or three readings myself. + + This keeps the front of the pack dense. The front of the + pack never contains data that isn't relevant to a "recent" + object. The size optimization comes from our use of xdelta + (but is true for many other delta algorithms): removing data + is cheaper (in size) than adding data. + + When you remove data, you only need to say "copy bytes n--m". + In contrast, in a delta that _adds_ data, you have to say "add + these bytes: 'actual data goes here'" + + *** njs` has quit: Read error: 104 (Connection reset by peer) + + Uhhuh. I hope I didn't blow njs` mind. + + *** njs` has joined channel #git + + :) + +The silent observers are amused. Of course. + +And as if njs` was expected to be omniscient: + + njs - did you miss anything? + +OK, I'll spell it out. That's Geek Humor. If njs` was not actually +connected for a little bit there, how would he know if missed anything +while he was disconnected? He's a benevolent dictator with a sense of +humor! Well noted! + + Stupid router. Or gremlins, or whatever. + +It's a cheap shot at Cisco. Take 'em when you can. + + Yes and no. Notice the rule: we only write out the base + object first if the delta against it was more recent. + + I'm getting lost in all these orders, let me re-read :-) + So the write-out order is from most recent to least recent? + (Conceivably it could be the opposite way too, I'm not sure if + we've said) though my connection back at home is logging, so I + can just read what you said there :-) + +And for those of you paying attention, the Omniscient Trick has just +been detailed! + + Yes, we always write out most recent first + +For the other record: + + njs`: http://pastebin.com/547965 + +The 'net never forgets, so that should be good until the end of time. + + And, yeah, I got the part about deeper-in-history stuff + having worse IO characteristics, one sort of doesn't care. + + With the caveat that if the "most recent" needs an older + object to delta against (hey, shrinking sometimes does + happen), we write out the old object with the delta. + + (if only it happened more...) + + Anyway, the pack-file could easily be denser still, but + because it's used both for streaming (the git protocol) and + for on-disk, it has a few pessimizations. + +Actually, it is a made-up word. But it is a made-up word being +used as setup for a later optimization, which is a real word: + + In particular, while the pack-file is then compressed, + it's compressed just one object at a time, so the actual + compression factor is less than it could be in theory. But it + means that it's all nice random-access with a simple index to + do "object name->location in packfile" translation. + + I'm assuming the real win for delta-ing large->small is + more homogeneous statistics for gzip to run over? + + (You have to put the bytes in one place or another, but + putting them in a larger blob wins on compression) + + Actually, what is the compression strategy -- each delta + individually gzipped, the whole file gzipped, somewhere in + between, no compression at all, ....? + + Right. + +Reality IRC sets in. For example: + + I'll read the rest in the morning, I really have to go + sleep or there's no hope whatsoever for me at the today's + exam... g'nite all. + +Heh. + + pasky: g'nite + + pasky: 'luck + + Right: large->small matters exactly because of compression + behaviour. If it was non-compressed, it probably wouldn't make + any difference. + + yeah + + Anyway: I'm not even trying to claim that the pack-files + are perfect, but they do tend to have a nice balance of + density vs ease-of use. + +Gasp! OK, saved. That's a fair Engineering trade off. Close call! +In fact, Linus reflects on some Basic Engineering Fundamentals, +design options, etc. + + More importantly, they allow git to still _conceptually_ + never deal with deltas at all, and be a "whole object" store. + + Which has some problems (we discussed bad huge-file + behaviour on the git lists the other day), but it does mean + that the basic git concepts are really really simple and + straightforward. + + It's all been quite stable. + + Which I think is very much a result of having very simple + basic ideas, so that there's never any confusion about what's + going on. + + Bugs happen, but they are "simple" bugs. And bugs that + actually get some object store detail wrong are almost always + so obvious that they never go anywhere. + + Yeah. + +Nuff said. + + Anyway. I'm off for bed. It's not 6AM here, but I've got + three kids, and have to get up early in the morning to send + them off. I need my beauty sleep. + + :-) + + appreciate the infodump, I really was failing to find the + details on git packs :-) + +And now you know the rest of the story. diff --git a/technical/pack-protocol.txt b/technical/pack-protocol.txt new file mode 100644 index 000000000..9cd48b485 --- /dev/null +++ b/technical/pack-protocol.txt @@ -0,0 +1,41 @@ +Pack transfer protocols +======================= + +There are two Pack push-pull protocols. + +upload-pack (S) | fetch/clone-pack (C) protocol: + + # Tell the puller what commits we have and what their names are + S: SHA1 name + S: ... + S: SHA1 name + S: # flush -- it's your turn + # Tell the pusher what commits we want, and what we have + C: want name + C: .. + C: want name + C: have SHA1 + C: have SHA1 + C: ... + C: # flush -- occasionally ask "had enough?" + S: NAK + C: have SHA1 + C: ... + C: have SHA1 + S: ACK + C: done + S: XXXXXXX -- packfile contents. + +send-pack | receive-pack protocol. + + # Tell the pusher what commits we have and what their names are + C: SHA1 name + C: ... + C: SHA1 name + C: # flush -- it's your turn + # Tell the puller what the pusher has + S: old-SHA1 new-SHA1 name + S: old-SHA1 new-SHA1 name + S: ... + S: # flush -- done with the list + S: XXXXXXX --- packfile contents. diff --git a/technical/racy-git.txt b/technical/racy-git.txt new file mode 100644 index 000000000..5030d9f2f --- /dev/null +++ b/technical/racy-git.txt @@ -0,0 +1,195 @@ +Use of index and Racy git problem +================================= + +Background +---------- + +The index is one of the most important data structures in git. +It represents a virtual working tree state by recording list of +paths and their object names and serves as a staging area to +write out the next tree object to be committed. The state is +"virtual" in the sense that it does not necessarily have to, and +often does not, match the files in the working tree. + +There are cases git needs to examine the differences between the +virtual working tree state in the index and the files in the +working tree. The most obvious case is when the user asks `git +diff` (or its low level implementation, `git diff-files`) or +`git-ls-files --modified`. In addition, git internally checks +if the files in the working tree are different from what are +recorded in the index to avoid stomping on local changes in them +during patch application, switching branches, and merging. + +In order to speed up this comparison between the files in the +working tree and the index entries, the index entries record the +information obtained from the filesystem via `lstat(2)` system +call when they were last updated. When checking if they differ, +git first runs `lstat(2)` on the files and compares the result +with this information (this is what was originally done by the +`ce_match_stat()` function, but the current code does it in +`ce_match_stat_basic()` function). If some of these "cached +stat information" fields do not match, git can tell that the +files are modified without even looking at their contents. + +Note: not all members in `struct stat` obtained via `lstat(2)` +are used for this comparison. For example, `st_atime` obviously +is not useful. Currently, git compares the file type (regular +files vs symbolic links) and executable bits (only for regular +files) from `st_mode` member, `st_mtime` and `st_ctime` +timestamps, `st_uid`, `st_gid`, `st_ino`, and `st_size` members. +With a `USE_STDEV` compile-time option, `st_dev` is also +compared, but this is not enabled by default because this member +is not stable on network filesystems. With `USE_NSEC` +compile-time option, `st_mtim.tv_nsec` and `st_ctim.tv_nsec` +members are also compared, but this is not enabled by default +because the value of this member becomes meaningless once the +inode is evicted from the inode cache on filesystems that do not +store it on disk. + + +Racy git +-------- + +There is one slight problem with the optimization based on the +cached stat information. Consider this sequence: + + : modify 'foo' + $ git update-index 'foo' + : modify 'foo' again, in-place, without changing its size + +The first `update-index` computes the object name of the +contents of file `foo` and updates the index entry for `foo` +along with the `struct stat` information. If the modification +that follows it happens very fast so that the file's `st_mtime` +timestamp does not change, after this sequence, the cached stat +information the index entry records still exactly match what you +would see in the filesystem, even though the file `foo` is now +different. +This way, git can incorrectly think files in the working tree +are unmodified even though they actually are. This is called +the "racy git" problem (discovered by Pasky), and the entries +that appear clean when they may not be because of this problem +are called "racily clean". + +To avoid this problem, git does two things: + +. When the cached stat information says the file has not been + modified, and the `st_mtime` is the same as (or newer than) + the timestamp of the index file itself (which is the time `git + update-index foo` finished running in the above example), it + also compares the contents with the object registered in the + index entry to make sure they match. + +. When the index file is updated that contains racily clean + entries, cached `st_size` information is truncated to zero + before writing a new version of the index file. + +Because the index file itself is written after collecting all +the stat information from updated paths, `st_mtime` timestamp of +it is usually the same as or newer than any of the paths the +index contains. And no matter how quick the modification that +follows `git update-index foo` finishes, the resulting +`st_mtime` timestamp on `foo` cannot get a value earlier +than the index file. Therefore, index entries that can be +racily clean are limited to the ones that have the same +timestamp as the index file itself. + +The callers that want to check if an index entry matches the +corresponding file in the working tree continue to call +`ce_match_stat()`, but with this change, `ce_match_stat()` uses +`ce_modified_check_fs()` to see if racily clean ones are +actually clean after comparing the cached stat information using +`ce_match_stat_basic()`. + +The problem the latter solves is this sequence: + + $ git update-index 'foo' + : modify 'foo' in-place without changing its size + : wait for enough time + $ git update-index 'bar' + +Without the latter, the timestamp of the index file gets a newer +value, and falsely clean entry `foo` would not be caught by the +timestamp comparison check done with the former logic anymore. +The latter makes sure that the cached stat information for `foo` +would never match with the file in the working tree, so later +checks by `ce_match_stat_basic()` would report that the index entry +does not match the file and git does not have to fall back on more +expensive `ce_modified_check_fs()`. + + +Runtime penalty +--------------- + +The runtime penalty of falling back to `ce_modified_check_fs()` +from `ce_match_stat()` can be very expensive when there are many +racily clean entries. An obvious way to artificially create +this situation is to give the same timestamp to all the files in +the working tree in a large project, run `git update-index` on +them, and give the same timestamp to the index file: + + $ date >.datestamp + $ git ls-files | xargs touch -r .datestamp + $ git ls-files | git update-index --stdin + $ touch -r .datestamp .git/index + +This will make all index entries racily clean. The linux-2.6 +project, for example, there are over 20,000 files in the working +tree. On my Athron 64X2 3800+, after the above: + + $ /usr/bin/time git diff-files + 1.68user 0.54system 0:02.22elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k + 0inputs+0outputs (0major+67111minor)pagefaults 0swaps + $ git update-index MAINTAINERS + $ /usr/bin/time git diff-files + 0.02user 0.12system 0:00.14elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k + 0inputs+0outputs (0major+935minor)pagefaults 0swaps + +Running `git update-index` in the middle checked the racily +clean entries, and left the cached `st_mtime` for all the paths +intact because they were actually clean (so this step took about +the same amount of time as the first `git diff-files`). After +that, they are not racily clean anymore but are truly clean, so +the second invocation of `git diff-files` fully took advantage +of the cached stat information. + + +Avoiding runtime penalty +------------------------ + +In order to avoid the above runtime penalty, post 1.4.2 git used +to have a code that made sure the index file +got timestamp newer than the youngest files in the index when +there are many young files with the same timestamp as the +resulting index file would otherwise would have by waiting +before finishing writing the index file out. + +I suspected that in practice the situation where many paths in the +index are all racily clean was quite rare. The only code paths +that can record recent timestamp for large number of paths are: + +. Initial `git add .` of a large project. + +. `git checkout` of a large project from an empty index into an + unpopulated working tree. + +Note: switching branches with `git checkout` keeps the cached +stat information of existing working tree files that are the +same between the current branch and the new branch, which are +all older than the resulting index file, and they will not +become racily clean. Only the files that are actually checked +out can become racily clean. + +In a large project where raciness avoidance cost really matters, +however, the initial computation of all object names in the +index takes more than one second, and the index file is written +out after all that happens. Therefore the timestamp of the +index file will be more than one seconds later than the the +youngest file in the working tree. This means that in these +cases there actually will not be any racily clean entry in +the resulting index. + +Based on this discussion, the current code does not use the +"workaround" to avoid the runtime penalty that does not exist in +practice anymore. This was done with commit 0fc82cff on Aug 15, +2006. diff --git a/technical/send-pack-pipeline.txt b/technical/send-pack-pipeline.txt new file mode 100644 index 000000000..681efe421 --- /dev/null +++ b/technical/send-pack-pipeline.txt @@ -0,0 +1,63 @@ +git-send-pack +============= + +Overall operation +----------------- + +. Connects to the remote side and invokes git-receive-pack. + +. Learns what refs the remote has and what commit they point at. + Matches them to the refspecs we are pushing. + +. Checks if there are non-fast-forwards. Unlike fetch-pack, + the repository send-pack runs in is supposed to be a superset + of the recipient in fast-forward cases, so there is no need + for want/have exchanges, and fast-forward check can be done + locally. Tell the result to the other end. + +. Calls pack_objects() which generates a packfile and sends it + over to the other end. + +. If the remote side is new enough (v1.1.0 or later), wait for + the unpack and hook status from the other end. + +. Exit with appropriate error codes. + + +Pack_objects pipeline +--------------------- + +This function gets one file descriptor (`fd`) which is either a +socket (over the network) or a pipe (local). What's written to +this fd goes to git-receive-pack to be unpacked. + + send-pack ---> fd ---> receive-pack + +The function pack_objects creates a pipe and then forks. The +forked child execs pack-objects with --revs to receive revision +parameters from its standard input. This process will write the +packfile to the other end. + + send-pack + | + pack_objects() ---> fd ---> receive-pack + | ^ (pipe) + v | + (child) + +The child dup2's to arrange its standard output to go back to +the other end, and read its standard input to come from the +pipe. After that it exec's pack-objects. On the other hand, +the parent process, before starting to feed the child pipeline, +closes the reading side of the pipe and fd to receive-pack. + + send-pack + | + pack_objects(parent) + | + v [0] + pack-objects [0] ---> receive-pack + + +[jc: the pipeline was much more complex and needed documentation before + I understood an earlier bug, but now it is trivial and straightforward.] diff --git a/technical/shallow.txt b/technical/shallow.txt new file mode 100644 index 000000000..559263af4 --- /dev/null +++ b/technical/shallow.txt @@ -0,0 +1,49 @@ +Def.: Shallow commits do have parents, but not in the shallow +repo, and therefore grafts are introduced pretending that +these commits have no parents. + +The basic idea is to write the SHA1s of shallow commits into +$GIT_DIR/shallow, and handle its contents like the contents +of $GIT_DIR/info/grafts (with the difference that shallow +cannot contain parent information). + +This information is stored in a new file instead of grafts, or +even the config, since the user should not touch that file +at all (even throughout development of the shallow clone, it +was never manually edited!). + +Each line contains exactly one SHA1. When read, a commit_graft +will be constructed, which has nr_parent < 0 to make it easier +to discern from user provided grafts. + +Since fsck-objects relies on the library to read the objects, +it honours shallow commits automatically. + +There are some unfinished ends of the whole shallow business: + +- maybe we have to force non-thin packs when fetching into a + shallow repo (ATM they are forced non-thin). + +- A special handling of a shallow upstream is needed. At some + stage, upload-pack has to check if it sends a shallow commit, + and it should send that information early (or fail, if the + client does not support shallow repositories). There is no + support at all for this in this patch series. + +- Instead of locking $GIT_DIR/shallow at the start, just + the timestamp of it is noted, and when it comes to writing it, + a check is performed if the mtime is still the same, dying if + it is not. + +- It is unclear how "push into/from a shallow repo" should behave. + +- If you deepen a history, you'd want to get the tags of the + newly stored (but older!) commits. This does not work right now. + +To make a shallow clone, you can call "git-clone --depth 20 repo". +The result contains only commit chains with a length of at most 20. +It also writes an appropriate $GIT_DIR/shallow. + +You can deepen a shallow repository with "git-fetch --depth 20 +repo branch", which will fetch branch from repo, but stop at depth +20, updating $GIT_DIR/shallow. diff --git a/technical/trivial-merge.txt b/technical/trivial-merge.txt new file mode 100644 index 000000000..24c84100b --- /dev/null +++ b/technical/trivial-merge.txt @@ -0,0 +1,121 @@ +Trivial merge rules +=================== + +This document describes the outcomes of the trivial merge logic in read-tree. + +One-way merge +------------- + +This replaces the index with a different tree, keeping the stat info +for entries that don't change, and allowing -u to make the minimum +required changes to the working tree to have it match. + +Entries marked '+' have stat information. Spaces marked '*' don't +affect the result. + + index tree result + ----------------------- + * (empty) (empty) + (empty) tree tree + index+ tree tree + index+ index index+ + +Two-way merge +------------- + +It is permitted for the index to lack an entry; this does not prevent +any case from applying. + +If the index exists, it is an error for it not to match either the old +or the result. + +If multiple cases apply, the one used is listed first. + +A result which changes the index is an error if the index is not empty +and not up-to-date. + +Entries marked '+' have stat information. Spaces marked '*' don't +affect the result. + + case index old new result + ------------------------------------- + 0/2 (empty) * (empty) (empty) + 1/3 (empty) * new new + 4/5 index+ (empty) (empty) index+ + 6/7 index+ (empty) index index+ + 10 index+ index (empty) (empty) + 14/15 index+ old old index+ + 18/19 index+ old index index+ + 20 index+ index new new + +Three-way merge +--------------- + +It is permitted for the index to lack an entry; this does not prevent +any case from applying. + +If the index exists, it is an error for it not to match either the +head or (if the merge is trivial) the result. + +If multiple cases apply, the one used is listed first. + +A result of "no merge" means that index is left in stage 0, ancest in +stage 1, head in stage 2, and remote in stage 3 (if any of these are +empty, no entry is left for that stage). Otherwise, the given entry is +left in stage 0, and there are no other entries. + +A result of "no merge" is an error if the index is not empty and not +up-to-date. + +*empty* means that the tree must not have a directory-file conflict + with the entry. + +For multiple ancestors, a '+' means that this case applies even if +only one ancestor or remote fits; a '^' means all of the ancestors +must be the same. + +case ancest head remote result +---------------------------------------- +1 (empty)+ (empty) (empty) (empty) +2ALT (empty)+ *empty* remote remote +2 (empty)^ (empty) remote no merge +3ALT (empty)+ head *empty* head +3 (empty)^ head (empty) no merge +4 (empty)^ head remote no merge +5ALT * head head head +6 ancest+ (empty) (empty) no merge +8 ancest^ (empty) ancest no merge +7 ancest+ (empty) remote no merge +10 ancest^ ancest (empty) no merge +9 ancest+ head (empty) no merge +16 anc1/anc2 anc1 anc2 no merge +13 ancest+ head ancest head +14 ancest+ ancest remote remote +11 ancest+ head remote no merge + +Only #2ALT and #3ALT use *empty*, because these are the only cases +where there can be conflicts that didn't exist before. Note that we +allow directory-file conflicts between things in different stages +after the trivial merge. + +A possible alternative for #6 is (empty), which would make it like +#1. This is not used, due to the likelihood that it arises due to +moving the file to multiple different locations or moving and deleting +it in different branches. + +Case #1 is included for completeness, and also in case we decide to +put on '+' markings; any path that is never mentioned at all isn't +handled. + +Note that #16 is when both #13 and #14 apply; in this case, we refuse +the trivial merge, because we can't tell from this data which is +right. This is a case of a reverted patch (in some direction, maybe +multiple times), and the right answer depends on looking at crossings +of history or common ancestors of the ancestors. + +Note that, between #6, #7, #9, and #11, all cases not otherwise +covered are handled in this table. + +For #8 and #10, there is alternative behavior, not currently +implemented, where the result is (empty). As currently implemented, +the automatic merge will generally give this effect.