1 Return-Path: <david@tethera.net>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 17791431FBD
\r
6 for <notmuch@notmuchmail.org>; Sun, 20 Apr 2014 05:59:57 -0700 (PDT)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none]
\r
13 Received: from olra.theworths.org ([127.0.0.1])
\r
14 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
15 with ESMTP id BaCWYoFZAnqA for <notmuch@notmuchmail.org>;
\r
16 Sun, 20 Apr 2014 05:59:49 -0700 (PDT)
\r
17 Received: from mx.xen14.node3324.gplhost.com (gitolite.debian.net
\r
18 [87.98.215.224]) (using TLSv1 with cipher AES256-SHA (256/256 bits))
\r
19 (No client certificate requested)
\r
20 by olra.theworths.org (Postfix) with ESMTPS id EB126431FBC
\r
21 for <notmuch@notmuchmail.org>; Sun, 20 Apr 2014 05:59:48 -0700 (PDT)
\r
22 Received: from remotemail by mx.xen14.node3324.gplhost.com with local (Exim
\r
23 4.72) (envelope-from <david@tethera.net>)
\r
24 id 1WbrLV-0005Wu-Od; Sun, 20 Apr 2014 12:59:45 +0000
\r
25 Received: (nullmailer pid 17456 invoked by uid 1000); Sun, 20 Apr 2014
\r
27 From: David Bremner <david@tethera.net>
\r
28 To: Carl Worth <cworth@cworth.org>, Mark Walters <markwalters1009@gmail.com>,
\r
29 notmuch <notmuch@notmuchmail.org>
\r
30 Subject: Re: [RFC PATCH] Re: excessive thread fusing
\r
31 In-Reply-To: <87oazwjq1e.fsf@yoom.home.cworth.org>
\r
32 References: <87ioq5mrbz.fsf@maritornes.cs.unb.ca> <87fvl8mpzj.fsf@qmul.ac.uk>
\r
33 <87oazwjq1e.fsf@yoom.home.cworth.org>
\r
34 User-Agent: Notmuch/0.17+202~gb65f328 (http://notmuchmail.org) Emacs/24.3.1
\r
35 (x86_64-pc-linux-gnu)
\r
36 Date: Sun, 20 Apr 2014 21:59:26 +0900
\r
37 Message-ID: <87fvl8upg1.fsf@maritornes.cs.unb.ca>
\r
39 Content-Type: text/plain
\r
40 X-BeenThere: notmuch@notmuchmail.org
\r
41 X-Mailman-Version: 2.1.13
\r
43 List-Id: "Use and development of the notmuch mail system."
\r
44 <notmuch.notmuchmail.org>
\r
45 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
46 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
47 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
48 List-Post: <mailto:notmuch@notmuchmail.org>
\r
49 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
50 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
51 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
52 X-List-Received-Date: Sun, 20 Apr 2014 12:59:57 -0000
\r
54 Carl Worth <cworth@cworth.org> writes:
\r
56 > Another idea would be to trigger specifically on common forms. Judging
\r
57 > From the samples in this particular thread, it seems like a workable
\r
58 > heuristic would be:
\r
60 > If the In-Reply-To header begins with '<':
\r
62 > Parse that initial portion as a message ID
\r
64 > Else if it ends with '>':
\r
66 > Parse that final portion as a message ID
\r
70 > Ignore this garbage-valued header.
\r
73 using the hacky script below, I scanned my own mail collection of about
\r
74 300k messages. I can make the following observations
\r
76 - I have some RFC compliant in-reply-to's with multiple ids
\r
77 - I have have a non-trivial number of Message from $NAME <address> of $date <id>
\r
78 - I didn't see any cases where using the last angle bracketed thing
\r
80 - I did see some some cases where the header starts with '<' but the
\r
81 matching '>' was missing
\r
82 - I also noticed some rfc2047 encoding of in-reply-to headers.
\r
85 ######################################################################
\r
86 # hacky script follows
\r
90 tempdir=$(mktemp -d)
\r
91 echo Writing to ${tempdir}
\r
93 find $dir -exec sh -c "formail -c -xIn-reply-to < {}" \; \
\r
96 sed -e 's/\t/ /' -e 's/ */ /g' -e 's/<[^ ]*>/<id>/g' -e 's/(.*)/(comment)/' < ${tempdir}/ids | sort | uniq | tee ${tempdir}/report
\r