From: Linus Torvalds Date: Thu, 25 Oct 2007 18:24:47 +0000 (-0700) Subject: Do exact rename detection regardless of rename limits X-Git-Tag: v1.5.4-rc0~301^2~2 X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=17559a643ecef94834d930790498c6babe3e89a8;p=git.git Do exact rename detection regardless of rename limits Now that the exact rename detection is linear-time (with a very small constant factor to boot), there is no longer any reason to limit it by the number of files involved. In some trivial testing, I created a repository with a directory that had a hundred thousand files in it (all with different contents), and then moved that directory to show the effects of renaming 100,000 files. With the new code, that resulted in [torvalds@woody big-rename]$ time ~/git/git show -C | wc -l 400006 real 0m2.071s user 0m1.520s sys 0m0.576s ie the code can correctly detect the hundred thousand renames in about 2 seconds (the number "400006" comes from four lines for each rename: diff --git a/really-big-dir/file-1-1-1-1-1 b/moved-big-dir/file-1-1-1-1-1 similarity index 100% rename from really-big-dir/file-1-1-1-1-1 rename to moved-big-dir/file-1-1-1-1-1 and the extra six lines is from a one-liner commit message and all the commit information and spacing). Most of those two seconds weren't even really the rename detection, it's really all the other stuff needed to get there. With the old code, this wouldn't have been practically possible. Doing a pairwise check of the ten billion possible pairs would have been prohibitively expensive. In fact, even with the rename limiter in place, the old code would waste a lot of time just on the diff_filespec checks, and despite not even trying to find renames, it used to look like: [torvalds@woody big-rename]$ time git show -C | wc -l 1400006 real 0m12.337s user 0m12.285s sys 0m0.192s ie we used to take 12 seconds for this load and not even do any rename detection! (The number 1400006 comes from fourteen lines per file moved: seven lines each for the delete and the create of a one-liner file, and the same extra six lines of commit information). Signed-off-by: Linus Torvalds Signed-off-by: Junio C Hamano --- diff --git a/diffcore-rename.c b/diffcore-rename.c index e7e370b2c..394693222 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -428,6 +428,12 @@ void diffcore_rename(struct diff_options *options) if (rename_dst_nr == 0 || rename_src_nr == 0) goto cleanup; /* nothing to do */ + /* + * We really want to cull the candidates list early + * with cheap tests in order to avoid doing deltas. + */ + rename_count = find_exact_renames(); + /* * This basically does a test for the rename matrix not * growing larger than a "rename_limit" square matrix, ie: @@ -444,12 +450,6 @@ void diffcore_rename(struct diff_options *options) if (rename_dst_nr * rename_src_nr > rename_limit * rename_limit) goto cleanup; - /* - * We really want to cull the candidates list early - * with cheap tests in order to avoid doing deltas. - */ - rename_count = find_exact_renames(); - /* Have we run out the created file pool? If so we can avoid * doing the delta matrix altogether. */