From 1c097891e49042cf7ee4628a58836625fb65016d Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Mon, 3 Sep 2007 12:59:55 -0400 Subject: [PATCH] user-manual: rewrite index discussion Add an example using git-ls-files, standardize on the new "index" terminology (as opposed to "cache"), attempt to clarify discussion and make it a little shorter, avoid some unnecessary jargon ("write-back cache"). Signed-off-by: J. Bruce Fields --- Documentation/user-manual.txt | 104 ++++++++++++++++++---------------- 1 file changed, 55 insertions(+), 49 deletions(-) diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt index e613ba8b8..065b1cc41 100644 --- a/Documentation/user-manual.txt +++ b/Documentation/user-manual.txt @@ -2911,57 +2911,63 @@ gitlink:git-verify-tag[1]. [[the-index]] -The "index" aka "Current Directory Cache" ------------------------------------------ +The index +----------- + +The index is a binary file (generally kept in .git/index) containing a +sorted list of path names, each with permissions and the SHA1 of a blob +object; gitlink:git-ls-files[1] can show you the contents of the index: -The index is a simple binary file, which contains an efficient -representation of the contents of a virtual directory. It -does so by a simple array that associates a set of names, dates, -permissions and content (aka "blob") objects together. The cache is -always kept ordered by name, and names are unique (with a few very -specific rules) at any point in time, but the cache has no long-term -meaning, and can be partially updated at any time. - -In particular, the index certainly does not need to be consistent with -the current directory contents (in fact, most operations will depend on -different ways to make the index 'not' be consistent with the directory -hierarchy), but it has three very important attributes: - -'(a) it can re-generate the full state it caches (not just the -directory structure: it contains pointers to the "blob" objects so -that it can regenerate the data too)' - -As a special case, there is a clear and unambiguous one-way mapping -from a current directory cache to a "tree object", which can be -efficiently created from just the current directory cache without -actually looking at any other data. So a directory cache at any one -time uniquely specifies one and only one "tree" object (but has -additional data to make it easy to match up that tree object with what -has happened in the directory) - -'(b) it has efficient methods for finding inconsistencies between that -cached state ("tree object waiting to be instantiated") and the -current state.' - -'(c) it can additionally efficiently represent information about merge -conflicts between different tree objects, allowing each pathname to be +------------------------------------------------- +$ git ls-files --stage +100644 63c918c667fa005ff12ad89437f2fdc80926e21c 0 .gitignore +100644 5529b198e8d14decbe4ad99db3f7fb632de0439d 0 .mailmap +100644 6ff87c4664981e4397625791c8ea3bbb5f2279a3 0 COPYING +100644 a37b2152bd26be2c2289e1f57a292534a51a93c7 0 Documentation/.gitignore +100644 fbefe9a45b00a54b58d94d06eca48b03d40a50e0 0 Documentation/Makefile +... +100644 2511aef8d89ab52be5ec6a5e46236b4b6bcd07ea 0 xdiff/xtypes.h +100644 2ade97b2574a9f77e7ae4002a4e07a6a38e46d07 0 xdiff/xutils.c +100644 d5de8292e05e7c36c4b68857c1cf9855e3d2f70a 0 xdiff/xutils.h +------------------------------------------------- + +Note that in older documentation you may see the index called the +"current directory cache" or just the "cache". It has three important +properties: + +1. The index contains all the information necessary to generate a single +(uniquely determined) tree object. ++ +For example, running gitlink:git-commit[1] generates this tree object +from the index, stores it in the object database, and uses it as the +tree object associated with the new commit. + +2. The index enables fast comparisons between the tree object it defines +and the working tree. ++ +It does this by storing some additional data for each entry (such as +the last modified time). This data is not displayed above, and is not +stored in the created tree object, but it can be used to determine +quickly which files in the working directory differ from what was +stored in the index, and thus save git from having to read all of the +data from such files to look for changes. + +3. It can efficiently represent information about merge conflicts +between different tree objects, allowing each pathname to be associated with sufficient information about the trees involved that -you can create a three-way merge between them.' - -Those are the ONLY three things that the directory cache does. It's a -cache, and the normal operation is to re-generate it completely from a -known tree object, or update/compare it with a live tree that is being -developed. If you blow the directory cache away entirely, you generally -haven't lost any information as long as you have the name of the tree -that it described. - -At the same time, the index is also the staging area for creating -new trees, and creating a new tree always involves a controlled -modification of the index file. In particular, the index file can -have the representation of an intermediate tree that has not yet been -instantiated. So the index can be thought of as a write-back cache, -which can contain dirty information that has not yet been written back -to the backing store. +you can create a three-way merge between them. ++ +We saw in <> that during a merge the index can +store multiple versions of a single file (called "stages"). The third +column in the gitlink:git-ls-files[1] output above is the stage +number, and will take on values other than 0 for files with merge +conflicts. + +The index is thus a sort of temporary staging area, which is filled with +a tree which you are in the process of working on. + +If you blow the index away entirely, you generally haven't lost any +information as long as you have the name of the tree that it described. [[low-level-operations]] Low-level git operations -- 2.26.2