Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 16C81431FBC for ; Fri, 7 Nov 2014 11:05:32 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 3.181 X-Spam-Level: *** X-Spam-Status: No, score=3.181 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FRT_SOMA=3.28, FRT_SOMA2=0.001, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jK9x8K3STcqZ for ; Fri, 7 Nov 2014 11:05:23 -0800 (PST) Received: from resqmta-po-03v.sys.comcast.net (resqmta-po-03v.sys.comcast.net [96.114.154.162]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 85174431FB6 for ; Fri, 7 Nov 2014 11:05:23 -0800 (PST) Received: from resomta-po-02v.sys.comcast.net ([96.114.154.226]) by resqmta-po-03v.sys.comcast.net with comcast id Cj4Y1p0084tLnxL01j5P4q; Fri, 07 Nov 2014 19:05:23 +0000 Received: from odin.tremily.us ([24.18.63.50]) by resomta-po-02v.sys.comcast.net with comcast id Cj3N1p005152l3L01j3NKA; Fri, 07 Nov 2014 19:03:23 +0000 Received: by odin.tremily.us (Postfix, from userid 1000) id AD3631476F60; Fri, 7 Nov 2014 11:03:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tremily.us; s=odin; t=1415387001; bh=Kh2QLEy4SAP5cx4l1iaezk+MrusS9fnXrUxa4FYdJUw=; h=Date:From:To:Cc:Subject; b=QlDq+Pvb+F9u6VN6FLPdEWkSgOglKj3QK0fJsxfGKjZVN8QQnpuO4JiXpB9qbIZIF PnE0EJxcMQdel+3d6QF7WQvKInR/bIK/juQ87buJPerXtam+lZ8GEXNeeGqiuLqWvI xQ6roGlXAr7i2KUzn+1++O9nU8pock1SIv4H54MY= Date: Fri, 7 Nov 2014 11:03:21 -0800 From: "W. Trevor King" To: notmuch@notmuchmail.org Subject: Mail archives in Git using ssoma Message-ID: <20141107190321.GL23609@odin.tremily.us> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="9JSHP372f+2dzJ8X" Content-Disposition: inline OpenPGP: id=39A2F3FA2AB17E5D8764F388FC29BDCDF15F5BE8; url=http://tremily.us/pubkey.txt User-Agent: Mutt/1.5.23 (2014-03-12) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1415387123; bh=EMVFRZ821lxzwm/G4cwidMqYECm8nLl+PTjsH7YQgfc=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=maAC+hXiJiVeTj+ewX3r21eF6EyrELH1QknYcT5nmUHSuhMpMMdA0xOnjxkgybAmt ZoS1qi6ELkZTJ820eHxeIQ46z4pGBnrUyGlnDbD5Jgz3QzZC+90ynnk1kO5AcvOXcF zEqeO6VkgseP4KZVAn3aQyF8uZVI6EioFxohztvVXcqk1jKlguia3S1+XmmV5z5sh3 UYwEaEj5QSHfMnBnCKu5NkJtlb1guqsidWYJNFcY6x8lipa+Gqi9uS26uqhEcU2zzA cQeBzdr1WztR1Ra63FOaRAG2Vx+YQLMdDEu20L2HFhm6wbt9S0PySCIuSc+6toMebB /+n/87/eNSakg== Cc: Eric Wong X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Nov 2014 19:05:32 -0000 --9JSHP372f+2dzJ8X Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello everyone :), I like Git, so when folks suggest storing things in Git, I'm usually excited ;). Eric Wong has been working on some tools to store email in a Git repository, and his client-side code is ssoma [1]. I wanted a bit more metadata than the stock ssoma-mda [2], and ended up just writing a ssoma-mda in Python [3]. It needs Python =E2=89=A53.4 and pygit2. I had pygit2 already installed for Python 3.3 (which gave me a local libgit2), so I used pip to install it for 3.4: $ python3.4 -m ensurepip --user $ pip3.4 install --user pygit2 Then I grabbed the archives, and pulled them into Git: $ wget http://notmuchmail.org/archives/notmuch.mbox $ git init --bare notmuch-archives.git $ cd notmuch-archives.git $ python3.4 >>> import email.utils >>> import mailbox >>> import ssoma_mda >>> mbox =3D mailbox.mbox('../notmuch.mbox', factory=3DNone, create=3DFal= se) >>> messages =3D sorted(mbox, key=3Dlambda m: email.utils.mktime_tz(email= =2Eutils.parsedate_tz(m['date']))) >>> for message in messages: ... if ((message['message-id'] =3D=3D '' = and ... message['X-List-Received-Date'] =3D=3D 'Sat, 26 Feb 2011 = 14:23:34 -0000') or ... (message['message-id'] =3D=3D '<4EDF728E.3050204@gmail.com>= ' and ... message['X-List-Received-Date'] =3D=3D 'Wed, 07 Dec 2011 = 14:05:16 -0000') or ... (message['message-id'] =3D=3D <4FE369F2.5080804@gmail.com>'= and ... message['X-List-Received-Date'] =3D=3D 'Thu, 21 Jun 2012 = 18:38:07 -0000') or ... (message['message-id'] =3D=3D '<5122353D.4060601@gmail.com>= ' and ... message['X-List-Received-Date'] =3D=3D 'Mon, 18 Feb 2013 = 14:06:12 -0000') or ... (message['message-id'] =3D=3D '' and ... message['X-List-Received-Date'] =3D=3D 'Wed, 24 Apr 2013 = 18:09:55 -0000') or ... (message['message-id'] =3D=3D '<527B9E8C.5000001@krugs.de>'= and ... message['X-List-Received-Date'] =3D=3D 'Thu, 07 Nov 2013 = 14:07:32 -0000') or ... (message['message-id'] =3D=3D '<1399645162-8653-1-git-send-= email-wael.nasreddine@gmail.com>' and ... message['X-List-Received-Date'] =3D=3D 'Fri, 09 May 2014 = 14:19:36 -0000') or ... (message['message-id'] =3D=3D '' a= nd ... message['X-List-Received-Date'] =3D=3D 'Thu, 18 Sep 2014 = 10:27:35 -0000') or ... (message['message-id'] =3D=3D '' and ... message['X-List-Received-Date'] !=3D 'Mon, 22 Sep 2014 09= :54:16 -0000')): ... continue ... ssoma_mda.deliver(message=3Dmessage, once=3DTrue) >>> ^D On my 1.1GHz Intel Celeron 847 Sandy Bridge netbook, that took about half an hour. The initial repository was large: $ du -hs . 394M . But packing it up made it small: $ git gc --aggressive du -hs . 51M . With a few less images than the mbox: $ git log --oneline | wc -l 19650 Compared with 19660 messages in the mbox at 107 MB (160 MB for the associated Maildir). The messages I dropped removed duplicate Message-IDs: * id:m2k4gmyjer.fsf@ecocode.net had different received dates: -X-List-Received-Date: Sat, 26 Feb 2011 14:12:20 -0000 +X-List-Received-Date: Sat, 26 Feb 2011 14:23:34 -0000 but no significant differences. * id:4EDF728E.3050204@gmail.com had a real address in the first-to-arrive version: -X-List-Received-Date: Wed, 07 Dec 2011 14:10:13 -0000 -> <4winter@informatik.uni-hamburg.de> an an obfuscated one in the second-to-arrive version: +X-List-Received-Date: Wed, 07 Dec 2011 14:05:16 -0000 +> <4winter-jNDFPZUTrfQBEfOqpokbeYV0Y/DQsy6Ps0AfqQuZ5sE@public.gmane.or= g> * id:4FE369F2.5080804@gmail.com had the same: -X-List-Received-Date: Thu, 21 Jun 2012 18:37:54 -0000 -> > wrote: * id:5122353D.4060601@gmail.com had different received dates: -X-List-Received-Date: Mon, 18 Feb 2013 14:06:05 -0000 +X-List-Received-Date: Mon, 18 Feb 2013 14:06:12 -0000 but no significant differences. * id:CA+eQo_1hMsTD4+6ifqgEQXW0_qYXGOdfkO6tBuGQKV+W7OSaKA@mail.gmail.com had different MIME boundaries: -Content-Type: multipart/alternative; boundary=3Df46d043be11ac45a0904db= 1f3428 -X-List-Received-Date: Wed, 24 Apr 2013 18:09:46 -0000 +Content-Type: multipart/alternative; boundary=3De89a8f646ff3faa11d04db= 1f3294 +X-List-Received-Date: Wed, 24 Apr 2013 18:09:55 -0000 but no significant differences. * id:527B9E8C.5000001@krugs.de had obfuscated addresses: -X-List-Received-Date: Thu, 07 Nov 2013 14:07:33 -0000 -> Rainer M Krug writes: +X-List-Received-Date: Thu, 07 Nov 2013 14:07:32 -0000 +> Rainer M Krug writes: * id:1399645162-8653-1-git-send-email-wael.nasreddine@gmail.com had additional content in the later submission: -Subject: [PATCH] Add Travis-CI config file. -Date: Fri, 9 May 2014 07:19:22 -0700 -X-List-Received-Date: Fri, 09 May 2014 14:19:36 -0000 - .travis.yml | 10 ++++++++++ - 1 file changed, 10 insertions(+) +Subject: [PATCH v2] Enable Travis-CI as a backup continuous integration + service. +Date: Fri, 9 May 2014 14:44:50 -0700 +X-List-Received-Date: Fri, 09 May 2014 21:45:16 -0000 + +The v2 adds a notification section to send failure (or back to passing= ) notifications +to the mailing list and to the IRC channel + + .travis.yml | 13 +++++++++++++ + 1 file changed, 13 insertions(+) * id:m2mw9xkyvg.fsf@krugs.de had an obfuscated adderss and different signat= ure: -X-List-Received-Date: Thu, 18 Sep 2014 10:27:31 -0000 ->> guyzmo writes: -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) -iQEcBAEBAgAGBQJUGrN3AAoJENvXNx4PUvmC4J0IAN9Wf+0ArvirJCoewItnEZoo -ySg4VRP7uWVqDxHVl5N9XFv4YE2bZ2E2eMGvbo6v7I82lhqeR5dauZhlgCMki+ZI +X-List-Received-Date: Thu, 18 Sep 2014 10:27:35 -0000 +>> guyzmo writes: -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) +iQEcBAEBAgAGBQJUGrN4AAoJENvXNx4PUvmC6LsIAIaFrd4MFnm8EixrAHPGfW6j +L3KNG7Dv+hQuNRUN6qn+emZHI8wX4O74HOZOpZWkE09CmjkPJBmf7IuJwtz2ONbM * id:cover.1411379395.git.jani@nikula.org came in three times, with three dates, but no significant differences: Date: Mon, 22 Sep 2014 11:54:20 +0200 X-List-Received-Date: Mon, 22 Sep 2014 09:54:16 -0000 Date: Mon, 22 Sep 2014 11:54:42 +0200 X-List-Received-Date: Mon, 22 Sep 2014 09:54:37 -0000 Date: Mon, 22 Sep 2014 11:54:51 +0200 X-List-Received-Date: Mon, 22 Sep 2014 09:54:49 -0000 Anyhow, I've pushed the Git archive [4,5] if anyone wants to play around with ssoma. I think this would be a nice backend for folks building notmuch-based web archives, and pulling from Git is easier than downloading a new mbox ;). Cheers, Trevor [1]: http://ssoma.public-inbox.org/README [2]: http://public-inbox.org/meta/m/ec8f54cf6451eef6e9f59eff691cd9002f4fdf6= 5.html [3]: http://git.tremily.us/?p=3Dssoma-mda.git;a=3Dshortlog;h=3Drefs/heads/p= ython I have an uncommitted patch to work around http://bugs.python.org/issu= e22684 [4]: http://git.tremily.us/?p=3Dnotmuch-archives.git [5]: git://tremily.us/notmuch-archives.git --=20 This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy --9JSHP372f+2dzJ8X Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJUXRd3AAoJEG8/JgBt8ol86+IQAIJCbVS05SYDLgtTfo7+petJ hptvLRSWq0jNefa5r1ipOpCrzpWDSe/Rjcdb09jodvZzsNoHrS3rNQJPKmEvcE6b xP6YZ9wukBtMbSHx0XoCUpRri5LlQ2+774vMq2riT1X0qZov/uNG19XUorAoGj5U f64pYH7Q8rVk7NwfszNgmgbrujXoBMRIJV5CVkdiCOnTPxr9zmZc7wXuQheIueO6 Ow0aR9A++Wo8lUwCpQPRqTr2Fl4xxBwtLhigJOezOh3gbqGavaua6j0K+B1oQ1nL W0iyE+GE4HVzx3npYWEqROMPnZ7Dsoiz2oQrbAZ+Xnkjw2SZyaFoI7KfpDa6WgD0 hmVEdUBYD5uvrqmqKA12R6P70skiuujgKiW8npVcU2Xggoe0sS/gR6adkV2joF7F qeTNJ+AqzL2S7WQ7Kja43Y+a2Nrsk3nbDMDRmgUK+DL2JzXKcx9HtZO/9JeKMwh5 xtsZJ08D2rgOMgM4pW6ZxZGcDLVeKDqvDF+dZA6v/ruaIJmbyen6RBGc6J63cSGI wfn1xFUbG0ZxhnV896UTuEMH5861pzenpXM2IZsT7T0XPCO/bTNdaBylnahQvBP4 tIFD2smexq6CGAyw1SEy3CcJrFFyozAJ48gGaBmOdLt+SfoKrF9j/XX1bb4YqcWb q9xzd66reO3ffkkXPPuV =aswS -----END PGP SIGNATURE----- --9JSHP372f+2dzJ8X--