1 Return-Path: <aaronecay@gmail.com>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 3D002431FB6
\r
6 for <notmuch@notmuchmail.org>; Mon, 25 Feb 2013 15:50:50 -0800 (PST)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=-0.799 tagged_above=-999 required=5
\r
12 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
\r
13 FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled
\r
14 Received: from olra.theworths.org ([127.0.0.1])
\r
15 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
16 with ESMTP id Idp4XA5TlSoJ for <notmuch@notmuchmail.org>;
\r
17 Mon, 25 Feb 2013 15:50:48 -0800 (PST)
\r
18 Received: from mail-qa0-f43.google.com (mail-qa0-f43.google.com
\r
19 [209.85.216.43]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
\r
20 (No client certificate requested)
\r
21 by olra.theworths.org (Postfix) with ESMTPS id 28145431FAF
\r
22 for <notmuch@notmuchmail.org>; Mon, 25 Feb 2013 15:50:48 -0800 (PST)
\r
23 Received: by mail-qa0-f43.google.com with SMTP id dx4so1998837qab.16
\r
24 for <notmuch@notmuchmail.org>; Mon, 25 Feb 2013 15:50:46 -0800 (PST)
\r
25 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
\r
26 h=x-received:from:to:subject:date:message-id:x-mailer;
\r
27 bh=Gb97bSBCXF90gLJGKBgLDpgYym8VnFeXujv5M97o9x4=;
\r
28 b=JBIhP3PVm+K6TEtLSNMpWdiQw47wdDy8cC5qTluNPjMfo2j6nLMRlYGcUBdS1P0fEn
\r
29 csEt6PN0FqXyrhcR1QTKPPvY7exswlndQ1jn9i6dae714hBL+na0gOZ6u+8+5vcZ+SAe
\r
30 z+5pyTkZwLDZMJy2If9KXFpJJN71RiOkv0LDD3ULM11mscKo3RAVtLMkQoQTsE4CJPcq
\r
31 Jqio7PckpojytDDFJfl8Eehw2J2pnXHtzF/0Q5TJb6lieK7UFVfGBzdRTFkKhlrM9xAd
\r
32 LZQUU9BI6GXelR3KSkrUchsJZB6ZBZ0B7ikfAMQ6bFh3Lld/Ka1eSxosSgE2xR++obR4
\r
34 X-Received: by 10.49.96.33 with SMTP id dp1mr16646678qeb.60.1361836245427;
\r
35 Mon, 25 Feb 2013 15:50:45 -0800 (PST)
\r
36 Received: from localhost.localdomain
\r
37 (vagvlan532.239.wlan.wireless-pennnet.upenn.edu. [128.91.71.113])
\r
38 by mx.google.com with ESMTPS id hr3sm19437068qab.4.2013.02.25.15.50.39
\r
39 (version=TLSv1.2 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
\r
40 Mon, 25 Feb 2013 15:50:44 -0800 (PST)
\r
41 From: Aaron Ecay <aaronecay@gmail.com>
\r
42 To: notmuch@notmuchmail.org
\r
43 Subject: [RFC] [PATCH] lib/database.cc: change how the parent of a message is
\r
45 Date: Mon, 25 Feb 2013 18:50:25 -0500
\r
46 Message-Id: <1361836225-17279-1-git-send-email-aaronecay@gmail.com>
\r
47 X-Mailer: git-send-email 1.8.1.4
\r
48 X-BeenThere: notmuch@notmuchmail.org
\r
49 X-Mailman-Version: 2.1.13
\r
51 List-Id: "Use and development of the notmuch mail system."
\r
52 <notmuch.notmuchmail.org>
\r
53 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
54 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
55 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
56 List-Post: <mailto:notmuch@notmuchmail.org>
\r
57 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
58 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
59 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
60 X-List-Received-Date: Mon, 25 Feb 2013 23:50:50 -0000
\r
62 Presently, the code which finds the parent of a message as it is being
\r
63 added to the database assumes that the first Message-ID-like substring
\r
64 of the In-Reply-To header is the parent Message ID. Some mail clients,
\r
65 however, put stuff other than the Message-ID of the parent in the
\r
66 In-Reply-To header, such as the email address of the sender of the
\r
67 parent. This can fool notmuch.
\r
69 The updated algorithm prefers the last Message ID in the References
\r
70 header. The References header lists messages oldest-first, so the last
\r
71 Message ID is the parent (RFC2822, p. 24). The References header is
\r
72 also less likely to be in a non-standard
\r
73 syntax (http://cr.yp.to/immhf/thread.html,
\r
74 http://www.jwz.org/doc/threading.html). In case the References header
\r
75 is not to be found, fall back to the old behavior.
\r
78 I especially notice this problem on public mailing lists, where
\r
79 certain people's messages always cause an "out-dent" of the threading,
\r
80 instead of being nested under whichever message they are replies to.
\r
82 Technically, putting non-Message-ID crud in the In-Reply-To field is a
\r
83 violation of RFC2822, but it appears that in practice the References
\r
84 header is respected more often than the In-Reply-To one.
\r
86 lib/database.cc | 30 ++++++++++++++++++++++--------
\r
87 1 file changed, 22 insertions(+), 8 deletions(-)
\r
89 diff --git a/lib/database.cc b/lib/database.cc
\r
90 index 91d4329..cbf33ae 100644
\r
91 --- a/lib/database.cc
\r
92 +++ b/lib/database.cc
\r
93 @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next)
\r
94 * 'message_id' in the result (to avoid mass confusion when a single
\r
95 * message references itself cyclically---and yes, mail messages are
\r
96 * not infrequent in the wild that do this---don't ask me why).
\r
98 + * Return the last reference parsed.
\r
102 parse_references (void *ctx,
\r
103 const char *message_id,
\r
105 @@ -511,7 +513,7 @@ parse_references (void *ctx,
\r
108 if (refs == NULL || *refs == '\0')
\r
113 ref = _parse_message_id (ctx, refs, &refs);
\r
114 @@ -519,6 +521,8 @@ parse_references (void *ctx,
\r
115 if (ref && strcmp (ref, message_id))
\r
116 g_hash_table_insert (hash, ref, NULL);
\r
123 @@ -1365,7 +1369,7 @@ _notmuch_database_generate_doc_id (notmuch_database_t *notmuch)
\r
124 notmuch->last_doc_id++;
\r
126 if (notmuch->last_doc_id == 0)
\r
127 - INTERNAL_ERROR ("Xapian document IDs are exhausted.\n");
\r
128 + INTERNAL_ERROR ("Xapian document IDs are exhausted.\n");
\r
130 return notmuch->last_doc_id;
\r
132 @@ -1509,7 +1513,7 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch,
\r
133 const char **thread_id)
\r
135 GHashTable *parents = NULL;
\r
136 - const char *refs, *in_reply_to, *in_reply_to_message_id;
\r
137 + const char *refs, *in_reply_to, *in_reply_to_message_id, *last_ref_message_id;
\r
138 GList *l, *keys = NULL;
\r
139 notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
\r
141 @@ -1517,21 +1521,31 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch,
\r
142 _my_talloc_free_for_g_hash, NULL);
\r
144 refs = notmuch_message_file_get_header (message_file, "references");
\r
145 - parse_references (message, notmuch_message_get_message_id (message),
\r
147 + last_ref_message_id = parse_references (message,
\r
148 + notmuch_message_get_message_id (message),
\r
151 in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to");
\r
152 parse_references (message, notmuch_message_get_message_id (message),
\r
153 parents, in_reply_to);
\r
155 - /* Carefully avoid adding any self-referential in-reply-to term. */
\r
156 in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL);
\r
157 + /* If the parent message ID from the Reply-To and References
\r
158 + * headers are different, use the References one. This is because
\r
159 + * the Reply-To header is more likely to be in an non-standard
\r
161 + if (in_reply_to_message_id &&
\r
162 + last_ref_message_id &&
\r
163 + strcmp (last_ref_message_id, in_reply_to_message_id)) {
\r
164 + in_reply_to_message_id = last_ref_message_id;
\r
166 + /* Carefully avoid adding any self-referential in-reply-to term. */
\r
167 if (in_reply_to_message_id &&
\r
168 strcmp (in_reply_to_message_id,
\r
169 notmuch_message_get_message_id (message)))
\r
171 _notmuch_message_add_term (message, "replyto",
\r
172 - _parse_message_id (message, in_reply_to, NULL));
\r
173 + in_reply_to_message_id);
\r
176 keys = g_hash_table_get_keys (parents);
\r