1 Return-Path: <stewart@flamingspork.com>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 14EF7431FBC
\r
6 for <notmuch@notmuchmail.org>; Wed, 17 Feb 2010 02:07:32 -0800 (PST)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=-0.703 tagged_above=-999 required=5
\r
12 tests=[AWL=-0.704, BAYES_50=0.001] autolearn=ham
\r
13 Received: from olra.theworths.org ([127.0.0.1])
\r
14 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
15 with ESMTP id lOf9puRcGlLg for <notmuch@notmuchmail.org>;
\r
16 Wed, 17 Feb 2010 02:07:31 -0800 (PST)
\r
17 Received: from kaylee.flamingspork.com (kaylee.flamingspork.com
\r
19 by olra.theworths.org (Postfix) with ESMTP id 25CBD431FAE
\r
20 for <notmuch@notmuchmail.org>; Wed, 17 Feb 2010 02:07:31 -0800 (PST)
\r
21 Received: from willster (localhost [127.0.0.1])
\r
22 by kaylee.flamingspork.com (Postfix) with ESMTPS id 80F626396;
\r
23 Wed, 17 Feb 2010 10:04:26 +0000 (UTC)
\r
24 Received: by willster (Postfix, from userid 1000)
\r
25 id 431AA10FB47A; Wed, 17 Feb 2010 21:07:28 +1100 (EST)
\r
26 From: Stewart Smith <stewart@flamingspork.com>
\r
27 To: Ben Gamari <bgamari@gmail.com>, notmuch <notmuch@notmuchmail.org>
\r
28 In-Reply-To: <87ocjok8yo.fsf@willster.local.flamingspork.com>
\r
29 References: <20100215002914.GA22402@flamingspork.com>
\r
30 <1266347128-sup-7796@ben-laptop>
\r
31 <87ocjok8yo.fsf@willster.local.flamingspork.com>
\r
32 Date: Wed, 17 Feb 2010 21:07:28 +1100
\r
33 Message-ID: <87mxz8jhun.fsf@willster.local.flamingspork.com>
\r
35 Content-Type: multipart/mixed; boundary="=-=-="
\r
36 Subject: Re: [notmuch] Mail in git
\r
37 X-BeenThere: notmuch@notmuchmail.org
\r
38 X-Mailman-Version: 2.1.13
\r
40 List-Id: "Use and development of the notmuch mail system."
\r
41 <notmuch.notmuchmail.org>
\r
42 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
43 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
44 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
45 List-Post: <mailto:notmuch@notmuchmail.org>
\r
46 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
47 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
48 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
49 X-List-Received-Date: Wed, 17 Feb 2010 10:07:32 -0000
\r
53 On Wed, 17 Feb 2010 11:21:51 +1100, Stewart Smith <stewart@flamingspork.com> wrote:
\r
54 > Using fast-import is interesting. Does it update the working tree? The
\r
55 > big thing I wanted to avoid was creating a working tree (another million
\r
56 > inodes being created is not ever what I need)
\r
58 > Also interesting is the mention of creating packs on the fly... this
\r
59 > could save the time in first writing the object and then packing it (as
\r
62 > I'm going to play with this....
\r
66 good news... on my mailstore (which, as I've previously mentioned, takes
\r
67 about 10 minutes to run 'du' over, about the same time as 'notmuch new'
\r
70 using the (attached) evenless.pl to create a single commit with
\r
76 Down from a whopping 14-15GB!!!
\r
78 My previous effort (git-write-object, create pack every 1000 messages,
\r
79 rinse, repeat) took all night and got to 3.7GB.
\r
81 This took only 108 minutes.
\r
83 In both cases, i was creating the repository on another spindle (USB2.0
\r
84 disk attached to my laptop).
\r
86 git-ls-tree and git-cat-file both work for listing and getting objects.
\r
88 The next thing to think about is adding objects as they come
\r
89 in... creating a new commit with just an added file should be pretty
\r
90 simple and easy... but this means we get to keep a "revision history" of
\r
91 the mailstore, which is *possibly* not ideal in terms of storage
\r
92 efficiency (i'll do a trial with mine of doing one message at a time and
\r
93 seeing what the end size is).
\r
95 however... commit per added mail (or mails) does give us the advantage
\r
96 of a really well documented and tested backup system :)
\r
98 Deleting could be hard.. if we actually want the objects to go away in a
\r
99 "permanent" way (not just no longer be referenced).
\r
101 for the stats nerds:
\r
103 $ time perl /home/stewart/evenless/evenless.pl /home/stewart/Maildir/INBOX
\r
105 git-fast-import statistics:
\r
106 ---------------------------------------------------------------------
\r
107 Alloc'd objects: 785000
\r
108 Total objects: 781813 ( 79023 duplicates )
\r
109 blobs : 781363 ( 79023 duplicates 708627 deltas)
\r
110 trees : 449 ( 0 duplicates 0 deltas)
\r
111 commits: 1 ( 0 duplicates 0 deltas)
\r
112 tags : 0 ( 0 duplicates 0 deltas)
\r
113 Total branches: 1 ( 1 loads )
\r
114 marks: 1048576 ( 860386 unique )
\r
116 Memory total: 182780 KiB
\r
119 ---------------------------------------------------------------------
\r
120 pack_report: getpagesize() = 4096
\r
121 pack_report: core.packedGitWindowSize = 1073741824
\r
122 pack_report: core.packedGitLimit = 8589934592
\r
123 pack_report: pack_used_ctr = 1
\r
124 pack_report: pack_mmap_calls = 1
\r
125 pack_report: pack_open_windows = 1 / 1
\r
126 pack_report: pack_mapped = 388496447 / 388496447
\r
127 ---------------------------------------------------------------------
\r
137 Content-Type: text/x-perl
\r
138 Content-Disposition: inline; filename=evenless.pl
\r
139 Content-Description: evenless.pl: maildir to git using fast-import
\r
155 my $stripdir= $ARGV[0];
\r
157 sub fastimport_blobs ($);
\r
158 sub fastimport_blobs ($)
\r
160 my $dirname= shift @_;
\r
162 opendir (my $dirhandle, $dirname);
\r
163 foreach (readdir $dirhandle)
\r
166 next if /\.cmeta$/;
\r
167 next if /\.ibex.index$/;
\r
168 next if /\.ibex.index.data$/;
\r
169 next if /\.ev-summary$/;
\r
170 next if /\.ev-summary-meta$/;
\r
171 next if /\.notmuch$/;
\r
173 if (-d $dirname.'/'.$_)
\r
175 print STDERR "Recursing into $_/ ";
\r
176 fastimport_blobs($dirname.'/'.$_);
\r
181 my $sb= stat("$dirname/$_");
\r
182 print FASTIMPORT "blob\n";
\r
183 print FASTIMPORT "mark :$mark\n";
\r
184 print FASTIMPORT "data ".($sb->size)."\n";
\r
185 open FILEIN, "$dirname/$_";
\r
187 sysread FILEIN, $content, $sb->size;
\r
189 print FASTIMPORT $content;
\r
190 my $storedir= "$dirname/$_";
\r
191 $storedir=~ s/^$stripdir//;
\r
192 $storedir=~ s/^\///;
\r
193 $FILES.="M 0644 :$mark $storedir\n";
\r
199 open FASTIMPORT, "| git fast-import --date-format=rfc2822";
\r
201 fastimport_blobs($ARGV[0]);
\r
203 print FASTIMPORT "commit refs/heads/master\n";
\r
204 print FASTIMPORT "committer EvenLess <evenless\@evenless> ".`date -R`;
\r
205 print FASTIMPORT "data 11\n";
\r
206 print FASTIMPORT "mail commit\n";
\r
207 print FASTIMPORT $FILES;
\r
208 print FASTIMPORT "\n";
\r