1 Return-Path: <wking@tremily.us>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 16C81431FBC
\r
6 for <notmuch@notmuchmail.org>; Fri, 7 Nov 2014 11:05:32 -0800 (PST)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=3.181 tagged_above=-999 required=5
\r
12 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
\r
13 FRT_SOMA=3.28, FRT_SOMA2=0.001, RCVD_IN_DNSWL_NONE=-0.0001]
\r
15 Received: from olra.theworths.org ([127.0.0.1])
\r
16 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
17 with ESMTP id jK9x8K3STcqZ for <notmuch@notmuchmail.org>;
\r
18 Fri, 7 Nov 2014 11:05:23 -0800 (PST)
\r
19 Received: from resqmta-po-03v.sys.comcast.net (resqmta-po-03v.sys.comcast.net
\r
21 (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits))
\r
22 (No client certificate requested)
\r
23 by olra.theworths.org (Postfix) with ESMTPS id 85174431FB6
\r
24 for <notmuch@notmuchmail.org>; Fri, 7 Nov 2014 11:05:23 -0800 (PST)
\r
25 Received: from resomta-po-02v.sys.comcast.net ([96.114.154.226])
\r
26 by resqmta-po-03v.sys.comcast.net with comcast
\r
27 id Cj4Y1p0084tLnxL01j5P4q; Fri, 07 Nov 2014 19:05:23 +0000
\r
28 Received: from odin.tremily.us ([24.18.63.50])
\r
29 by resomta-po-02v.sys.comcast.net with comcast
\r
30 id Cj3N1p005152l3L01j3NKA; Fri, 07 Nov 2014 19:03:23 +0000
\r
31 Received: by odin.tremily.us (Postfix, from userid 1000)
\r
32 id AD3631476F60; Fri, 7 Nov 2014 11:03:21 -0800 (PST)
\r
33 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tremily.us; s=odin;
\r
34 t=1415387001; bh=Kh2QLEy4SAP5cx4l1iaezk+MrusS9fnXrUxa4FYdJUw=;
\r
35 h=Date:From:To:Cc:Subject;
\r
36 b=QlDq+Pvb+F9u6VN6FLPdEWkSgOglKj3QK0fJsxfGKjZVN8QQnpuO4JiXpB9qbIZIF
\r
37 PnE0EJxcMQdel+3d6QF7WQvKInR/bIK/juQ87buJPerXtam+lZ8GEXNeeGqiuLqWvI
\r
38 xQ6roGlXAr7i2KUzn+1++O9nU8pock1SIv4H54MY=
\r
39 Date: Fri, 7 Nov 2014 11:03:21 -0800
\r
40 From: "W. Trevor King" <wking@tremily.us>
\r
41 To: notmuch@notmuchmail.org
\r
42 Subject: Mail archives in Git using ssoma
\r
43 Message-ID: <20141107190321.GL23609@odin.tremily.us>
\r
45 Content-Type: multipart/signed; micalg=pgp-sha1;
\r
46 protocol="application/pgp-signature"; boundary="9JSHP372f+2dzJ8X"
\r
47 Content-Disposition: inline
\r
48 OpenPGP: id=39A2F3FA2AB17E5D8764F388FC29BDCDF15F5BE8;
\r
49 url=http://tremily.us/pubkey.txt
\r
50 User-Agent: Mutt/1.5.23 (2014-03-12)
\r
51 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net;
\r
52 s=q20140121; t=1415387123;
\r
53 bh=EMVFRZ821lxzwm/G4cwidMqYECm8nLl+PTjsH7YQgfc=;
\r
54 h=Received:Received:Received:Date:From:To:Subject:Message-ID:
\r
55 MIME-Version:Content-Type;
\r
56 b=maAC+hXiJiVeTj+ewX3r21eF6EyrELH1QknYcT5nmUHSuhMpMMdA0xOnjxkgybAmt
\r
57 ZoS1qi6ELkZTJ820eHxeIQ46z4pGBnrUyGlnDbD5Jgz3QzZC+90ynnk1kO5AcvOXcF
\r
58 zEqeO6VkgseP4KZVAn3aQyF8uZVI6EioFxohztvVXcqk1jKlguia3S1+XmmV5z5sh3
\r
59 UYwEaEj5QSHfMnBnCKu5NkJtlb1guqsidWYJNFcY6x8lipa+Gqi9uS26uqhEcU2zzA
\r
60 cQeBzdr1WztR1Ra63FOaRAG2Vx+YQLMdDEu20L2HFhm6wbt9S0PySCIuSc+6toMebB
\r
62 Cc: Eric Wong <e@80x24.org>
\r
63 X-BeenThere: notmuch@notmuchmail.org
\r
64 X-Mailman-Version: 2.1.13
\r
66 List-Id: "Use and development of the notmuch mail system."
\r
67 <notmuch.notmuchmail.org>
\r
68 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
69 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
70 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
71 List-Post: <mailto:notmuch@notmuchmail.org>
\r
72 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
73 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
74 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
75 X-List-Received-Date: Fri, 07 Nov 2014 19:05:32 -0000
\r
79 Content-Type: text/plain; charset=utf-8
\r
80 Content-Disposition: inline
\r
81 Content-Transfer-Encoding: quoted-printable
\r
85 I like Git, so when folks suggest storing things in Git, I'm usually
\r
86 excited ;). Eric Wong has been working on some tools to store email
\r
87 in a Git repository, and his client-side code is ssoma [1]. I wanted
\r
88 a bit more metadata than the stock ssoma-mda [2], and ended up just
\r
89 writing a ssoma-mda in Python [3]. It needs Python =E2=89=A53.4 and pygit2.
\r
90 I had pygit2 already installed for Python 3.3 (which gave me a local
\r
91 libgit2), so I used pip to install it for 3.4:
\r
93 $ python3.4 -m ensurepip --user
\r
94 $ pip3.4 install --user pygit2
\r
96 Then I grabbed the archives, and pulled them into Git:
\r
98 $ wget http://notmuchmail.org/archives/notmuch.mbox
\r
99 $ git init --bare notmuch-archives.git
\r
100 $ cd notmuch-archives.git
\r
102 >>> import email.utils
\r
104 >>> import ssoma_mda
\r
105 >>> mbox =3D mailbox.mbox('../notmuch.mbox', factory=3DNone, create=3DFal=
\r
107 >>> messages =3D sorted(mbox, key=3Dlambda m: email.utils.mktime_tz(email=
\r
108 =2Eutils.parsedate_tz(m['date'])))
\r
109 >>> for message in messages:
\r
110 ... if ((message['message-id'] =3D=3D '<m2k4gmyjer.fsf@ecocode.net>' =
\r
112 ... message['X-List-Received-Date'] =3D=3D 'Sat, 26 Feb 2011 =
\r
113 14:23:34 -0000') or
\r
114 ... (message['message-id'] =3D=3D '<4EDF728E.3050204@gmail.com>=
\r
116 ... message['X-List-Received-Date'] =3D=3D 'Wed, 07 Dec 2011 =
\r
117 14:05:16 -0000') or
\r
118 ... (message['message-id'] =3D=3D <4FE369F2.5080804@gmail.com>'=
\r
120 ... message['X-List-Received-Date'] =3D=3D 'Thu, 21 Jun 2012 =
\r
121 18:38:07 -0000') or
\r
122 ... (message['message-id'] =3D=3D '<5122353D.4060601@gmail.com>=
\r
124 ... message['X-List-Received-Date'] =3D=3D 'Mon, 18 Feb 2013 =
\r
125 14:06:12 -0000') or
\r
126 ... (message['message-id'] =3D=3D '<CA+eQo_1hMsTD4+6ifqgEQXW0_q=
\r
127 YXGOdfkO6tBuGQKV+W7OSaKA@mail.gmail.com>' and
\r
128 ... message['X-List-Received-Date'] =3D=3D 'Wed, 24 Apr 2013 =
\r
129 18:09:55 -0000') or
\r
130 ... (message['message-id'] =3D=3D '<527B9E8C.5000001@krugs.de>'=
\r
132 ... message['X-List-Received-Date'] =3D=3D 'Thu, 07 Nov 2013 =
\r
133 14:07:32 -0000') or
\r
134 ... (message['message-id'] =3D=3D '<1399645162-8653-1-git-send-=
\r
135 email-wael.nasreddine@gmail.com>' and
\r
136 ... message['X-List-Received-Date'] =3D=3D 'Fri, 09 May 2014 =
\r
137 14:19:36 -0000') or
\r
138 ... (message['message-id'] =3D=3D '<m2mw9xkyvg.fsf@krugs.de>' a=
\r
140 ... message['X-List-Received-Date'] =3D=3D 'Thu, 18 Sep 2014 =
\r
141 10:27:35 -0000') or
\r
142 ... (message['message-id'] =3D=3D '<cover.1411379395.git.jani@n=
\r
144 ... message['X-List-Received-Date'] !=3D 'Mon, 22 Sep 2014 09=
\r
147 ... ssoma_mda.deliver(message=3Dmessage, once=3DTrue)
\r
150 On my 1.1GHz Intel Celeron 847 Sandy Bridge netbook, that took about
\r
151 half an hour. The initial repository was large:
\r
156 But packing it up made it small:
\r
158 $ git gc --aggressive
\r
162 With a few less images than the mbox:
\r
164 $ git log --oneline | wc -l
\r
167 Compared with 19660 messages in the mbox at 107 MB (160 MB for the
\r
168 associated Maildir).
\r
170 The messages I dropped removed duplicate Message-IDs:
\r
172 * id:m2k4gmyjer.fsf@ecocode.net had different received dates:
\r
174 -X-List-Received-Date: Sat, 26 Feb 2011 14:12:20 -0000
\r
175 +X-List-Received-Date: Sat, 26 Feb 2011 14:23:34 -0000
\r
177 but no significant differences.
\r
179 * id:4EDF728E.3050204@gmail.com had a real address in the
\r
180 first-to-arrive version:
\r
182 -X-List-Received-Date: Wed, 07 Dec 2011 14:10:13 -0000
\r
183 -> <4winter@informatik.uni-hamburg.de>
\r
185 an an obfuscated one in the second-to-arrive version:
\r
187 +X-List-Received-Date: Wed, 07 Dec 2011 14:05:16 -0000
\r
188 +> <4winter-jNDFPZUTrfQBEfOqpokbeYV0Y/DQsy6Ps0AfqQuZ5sE@public.gmane.or=
\r
191 * id:4FE369F2.5080804@gmail.com had the same:
\r
193 -X-List-Received-Date: Thu, 21 Jun 2012 18:37:54 -0000
\r
194 -> <R.M.Krug@gmail.com
\r
196 +X-List-Received-Date: Thu, 21 Jun 2012 18:38:07 -0000
\r
197 -> <mailto:R.M.Krug@gmail.com>> wrote:
\r
199 * id:5122353D.4060601@gmail.com had different received dates:
\r
201 -X-List-Received-Date: Mon, 18 Feb 2013 14:06:05 -0000
\r
202 +X-List-Received-Date: Mon, 18 Feb 2013 14:06:12 -0000
\r
204 but no significant differences.
\r
206 * id:CA+eQo_1hMsTD4+6ifqgEQXW0_qYXGOdfkO6tBuGQKV+W7OSaKA@mail.gmail.com
\r
207 had different MIME boundaries:
\r
209 -Content-Type: multipart/alternative; boundary=3Df46d043be11ac45a0904db=
\r
211 -X-List-Received-Date: Wed, 24 Apr 2013 18:09:46 -0000
\r
213 +Content-Type: multipart/alternative; boundary=3De89a8f646ff3faa11d04db=
\r
215 +X-List-Received-Date: Wed, 24 Apr 2013 18:09:55 -0000
\r
217 but no significant differences.
\r
219 * id:527B9E8C.5000001@krugs.de had obfuscated addresses:
\r
221 -X-List-Received-Date: Thu, 07 Nov 2013 14:07:33 -0000
\r
222 -> Rainer M Krug <Rainer@krugs.de> writes:
\r
224 +X-List-Received-Date: Thu, 07 Nov 2013 14:07:32 -0000
\r
225 +> Rainer M Krug <Rainer-vfylz/Ys1k4@public.gmane.org> writes:
\r
227 * id:1399645162-8653-1-git-send-email-wael.nasreddine@gmail.com had
\r
228 additional content in the later submission:
\r
230 -Subject: [PATCH] Add Travis-CI config file.
\r
231 -Date: Fri, 9 May 2014 07:19:22 -0700
\r
232 -X-List-Received-Date: Fri, 09 May 2014 14:19:36 -0000
\r
233 - .travis.yml | 10 ++++++++++
\r
234 - 1 file changed, 10 insertions(+)
\r
236 +Subject: [PATCH v2] Enable Travis-CI as a backup continuous integration
\r
238 +Date: Fri, 9 May 2014 14:44:50 -0700
\r
239 +X-List-Received-Date: Fri, 09 May 2014 21:45:16 -0000
\r
241 +The v2 adds a notification section to send failure (or back to passing=
\r
243 +to the mailing list and to the IRC channel
\r
245 + .travis.yml | 13 +++++++++++++
\r
246 + 1 file changed, 13 insertions(+)
\r
248 * id:m2mw9xkyvg.fsf@krugs.de had an obfuscated adderss and different signat=
\r
251 -X-List-Received-Date: Thu, 18 Sep 2014 10:27:31 -0000
\r
252 ->> guyzmo <guyzmo@m0g.net> writes:
\r
253 -----BEGIN PGP SIGNATURE-----
\r
254 Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
\r
255 -iQEcBAEBAgAGBQJUGrN3AAoJENvXNx4PUvmC4J0IAN9Wf+0ArvirJCoewItnEZoo
\r
256 -ySg4VRP7uWVqDxHVl5N9XFv4YE2bZ2E2eMGvbo6v7I82lhqeR5dauZhlgCMki+ZI
\r
258 +X-List-Received-Date: Thu, 18 Sep 2014 10:27:35 -0000
\r
259 +>> guyzmo <guyzmo-kMjww5mZloE@public.gmane.org> writes:
\r
260 -----BEGIN PGP SIGNATURE-----
\r
261 Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
\r
262 +iQEcBAEBAgAGBQJUGrN4AAoJENvXNx4PUvmC6LsIAIaFrd4MFnm8EixrAHPGfW6j
\r
263 +L3KNG7Dv+hQuNRUN6qn+emZHI8wX4O74HOZOpZWkE09CmjkPJBmf7IuJwtz2ONbM
\r
265 * id:cover.1411379395.git.jani@nikula.org came in three times, with
\r
266 three dates, but no significant differences:
\r
268 Date: Mon, 22 Sep 2014 11:54:20 +0200
\r
269 X-List-Received-Date: Mon, 22 Sep 2014 09:54:16 -0000
\r
271 Date: Mon, 22 Sep 2014 11:54:42 +0200
\r
272 X-List-Received-Date: Mon, 22 Sep 2014 09:54:37 -0000
\r
274 Date: Mon, 22 Sep 2014 11:54:51 +0200
\r
275 X-List-Received-Date: Mon, 22 Sep 2014 09:54:49 -0000
\r
277 Anyhow, I've pushed the Git archive [4,5] if anyone wants to play
\r
278 around with ssoma. I think this would be a nice backend for folks
\r
279 building notmuch-based web archives, and pulling from Git is easier
\r
280 than downloading a new mbox ;).
\r
285 [1]: http://ssoma.public-inbox.org/README
\r
286 [2]: http://public-inbox.org/meta/m/ec8f54cf6451eef6e9f59eff691cd9002f4fdf6=
\r
288 [3]: http://git.tremily.us/?p=3Dssoma-mda.git;a=3Dshortlog;h=3Drefs/heads/p=
\r
290 I have an uncommitted patch to work around http://bugs.python.org/issu=
\r
292 [4]: http://git.tremily.us/?p=3Dnotmuch-archives.git
\r
293 [5]: git://tremily.us/notmuch-archives.git
\r
296 This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
\r
297 For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
\r
300 Content-Type: application/pgp-signature; name="signature.asc"
\r
301 Content-Description: OpenPGP digital signature
\r
303 -----BEGIN PGP SIGNATURE-----
\r
306 iQIcBAEBAgAGBQJUXRd3AAoJEG8/JgBt8ol86+IQAIJCbVS05SYDLgtTfo7+petJ
\r
307 hptvLRSWq0jNefa5r1ipOpCrzpWDSe/Rjcdb09jodvZzsNoHrS3rNQJPKmEvcE6b
\r
308 xP6YZ9wukBtMbSHx0XoCUpRri5LlQ2+774vMq2riT1X0qZov/uNG19XUorAoGj5U
\r
309 f64pYH7Q8rVk7NwfszNgmgbrujXoBMRIJV5CVkdiCOnTPxr9zmZc7wXuQheIueO6
\r
310 Ow0aR9A++Wo8lUwCpQPRqTr2Fl4xxBwtLhigJOezOh3gbqGavaua6j0K+B1oQ1nL
\r
311 W0iyE+GE4HVzx3npYWEqROMPnZ7Dsoiz2oQrbAZ+Xnkjw2SZyaFoI7KfpDa6WgD0
\r
312 hmVEdUBYD5uvrqmqKA12R6P70skiuujgKiW8npVcU2Xggoe0sS/gR6adkV2joF7F
\r
313 qeTNJ+AqzL2S7WQ7Kja43Y+a2Nrsk3nbDMDRmgUK+DL2JzXKcx9HtZO/9JeKMwh5
\r
314 xtsZJ08D2rgOMgM4pW6ZxZGcDLVeKDqvDF+dZA6v/ruaIJmbyen6RBGc6J63cSGI
\r
315 wfn1xFUbG0ZxhnV896UTuEMH5861pzenpXM2IZsT7T0XPCO/bTNdaBylnahQvBP4
\r
316 tIFD2smexq6CGAyw1SEy3CcJrFFyozAJ48gGaBmOdLt+SfoKrF9j/XX1bb4YqcWb
\r
317 q9xzd66reO3ffkkXPPuV
\r
319 -----END PGP SIGNATURE-----
\r
321 --9JSHP372f+2dzJ8X--
\r