1 Return-Path: <jani@nikula.org>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id D7467431FBC
\r
6 for <notmuch@notmuchmail.org>; Thu, 28 Feb 2013 11:41:21 -0800 (PST)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5
\r
12 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled
\r
13 Received: from olra.theworths.org ([127.0.0.1])
\r
14 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
15 with ESMTP id CVWGnYYK4NWU for <notmuch@notmuchmail.org>;
\r
16 Thu, 28 Feb 2013 11:41:19 -0800 (PST)
\r
17 Received: from mail-la0-f41.google.com (mail-la0-f41.google.com
\r
18 [209.85.215.41]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
\r
19 (No client certificate requested)
\r
20 by olra.theworths.org (Postfix) with ESMTPS id DF529431FAF
\r
21 for <notmuch@notmuchmail.org>; Thu, 28 Feb 2013 11:41:18 -0800 (PST)
\r
22 Received: by mail-la0-f41.google.com with SMTP id fo12so2188940lab.28
\r
23 for <notmuch@notmuchmail.org>; Thu, 28 Feb 2013 11:41:16 -0800 (PST)
\r
24 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
\r
25 d=google.com; s=20120113;
\r
26 h=x-received:from:to:subject:in-reply-to:references:user-agent:date
\r
27 :message-id:mime-version:content-type:x-gm-message-state;
\r
28 bh=k9EuZu6+hVJPIbkf59XdmyV9Y+qxB8aX3oOJkHI8feQ=;
\r
29 b=VWH/+NmnDNbbC/KNLHlxOW2KRyxqsbQyDuZFiQSu7AZFXF0/ItApTtMFNw/C3ZpoBd
\r
30 bkoutZiQB6y+a5FXJnfCUIHi6nzvZJBQaYZtvoOewEwtS20ssMgNA8pLYmoQr7JuVPOy
\r
31 j8mjrcOpBb3tauD19VTYAKM78ceaN2lfG4Gf7clmgO7RazGpyZD2kCdoOd2sTr+IwdPA
\r
32 Bfolh7SpgyGycz/heqCn8CEw2/RQCJT5vv+9Gy0FlwMCTLThxXWMJ1WRRbwOIt3iRGuV
\r
33 9b8fqUfJvnNwM+i4SZ3uj489jAS7W6WsAAeTUqZK/948l1x9VoPIVuZraK5dv+/ltAiI
\r
35 X-Received: by 10.112.13.200 with SMTP id j8mr4057894lbc.68.1362080475852;
\r
36 Thu, 28 Feb 2013 11:41:15 -0800 (PST)
\r
37 Received: from localhost (dsl-hkibrasgw4-50df51-27.dhcp.inet.fi.
\r
39 by mx.google.com with ESMTPS id fm8sm3283414lbb.17.2013.02.28.11.41.13
\r
40 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
\r
41 Thu, 28 Feb 2013 11:41:14 -0800 (PST)
\r
42 From: Jani Nikula <jani@nikula.org>
\r
43 To: Aaron Ecay <aaronecay@gmail.com>, notmuch@notmuchmail.org
\r
44 Subject: Re: [RFC] [PATCH] lib/database.cc: change how the parent of a message
\r
46 In-Reply-To: <1361836225-17279-1-git-send-email-aaronecay@gmail.com>
\r
47 References: <1361836225-17279-1-git-send-email-aaronecay@gmail.com>
\r
48 User-Agent: Notmuch/0.15.2+33~g98253a3 (http://notmuchmail.org) Emacs/24.2.1
\r
49 (x86_64-pc-linux-gnu)
\r
50 Date: Thu, 28 Feb 2013 21:41:16 +0200
\r
51 Message-ID: <87621cteeb.fsf@nikula.org>
\r
53 Content-Type: text/plain
\r
55 ALoCoQmRb+br0tqWNRdxc1RyXji58iziwxTGZotvTWLSVA5AvRzDUu7LRxiZyORYP7S9dIcpuQDM
\r
56 X-BeenThere: notmuch@notmuchmail.org
\r
57 X-Mailman-Version: 2.1.13
\r
59 List-Id: "Use and development of the notmuch mail system."
\r
60 <notmuch.notmuchmail.org>
\r
61 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
62 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
63 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
64 List-Post: <mailto:notmuch@notmuchmail.org>
\r
65 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
66 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
67 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
68 X-List-Received-Date: Thu, 28 Feb 2013 19:41:22 -0000
\r
73 On Tue, 26 Feb 2013, Aaron Ecay <aaronecay@gmail.com> wrote:
\r
74 > Presently, the code which finds the parent of a message as it is being
\r
75 > added to the database assumes that the first Message-ID-like substring
\r
76 > of the In-Reply-To header is the parent Message ID. Some mail clients,
\r
77 > however, put stuff other than the Message-ID of the parent in the
\r
78 > In-Reply-To header, such as the email address of the sender of the
\r
79 > parent. This can fool notmuch.
\r
81 I think the background is that RFC 822 defines In-Reply-To (and
\r
82 References too for that matter) as *(phrase / msg-id), while RFC 2822
\r
83 defines them as 1*msg-id. I'd like something about RFC 822 being
\r
84 mentioned in the commit message.
\r
86 The problem in the gmane message you link to in
\r
87 id:87liaa3luc.fsf@gmail.com is likely related to the FAQ item 05.26 "How
\r
88 do I fix a bogus In-Reply-To or missing References field?" in the MH FAQ
\r
89 http://www.newt.com/faq/mh.html.
\r
91 > The updated algorithm prefers the last Message ID in the References
\r
92 > header. The References header lists messages oldest-first, so the last
\r
93 > Message ID is the parent (RFC2822, p. 24). The References header is
\r
94 > also less likely to be in a non-standard
\r
95 > syntax (http://cr.yp.to/immhf/thread.html,
\r
96 > http://www.jwz.org/doc/threading.html). In case the References header
\r
97 > is not to be found, fall back to the old behavior.
\r
100 > I especially notice this problem on public mailing lists, where
\r
101 > certain people's messages always cause an "out-dent" of the threading,
\r
102 > instead of being nested under whichever message they are replies to.
\r
104 > Technically, putting non-Message-ID crud in the In-Reply-To field is a
\r
105 > violation of RFC2822, but it appears that in practice the References
\r
106 > header is respected more often than the In-Reply-To one.
\r
108 > lib/database.cc | 30 ++++++++++++++++++++++--------
\r
109 > 1 file changed, 22 insertions(+), 8 deletions(-)
\r
111 > diff --git a/lib/database.cc b/lib/database.cc
\r
112 > index 91d4329..cbf33ae 100644
\r
113 > --- a/lib/database.cc
\r
114 > +++ b/lib/database.cc
\r
115 > @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next)
\r
116 > * 'message_id' in the result (to avoid mass confusion when a single
\r
117 > * message references itself cyclically---and yes, mail messages are
\r
118 > * not infrequent in the wild that do this---don't ask me why).
\r
120 > + * Return the last reference parsed.
\r
124 > parse_references (void *ctx,
\r
125 > const char *message_id,
\r
126 > GHashTable *hash,
\r
127 > @@ -511,7 +513,7 @@ parse_references (void *ctx,
\r
130 > if (refs == NULL || *refs == '\0')
\r
135 > ref = _parse_message_id (ctx, refs, &refs);
\r
136 > @@ -519,6 +521,8 @@ parse_references (void *ctx,
\r
137 > if (ref && strcmp (ref, message_id))
\r
138 > g_hash_table_insert (hash, ref, NULL);
\r
143 As the comment for the function says, we explicitly avoid including
\r
144 self-references. I think I'd err on the safe side and return NULL if the
\r
145 last ref equals message-id.
\r
150 > @@ -1365,7 +1369,7 @@ _notmuch_database_generate_doc_id (notmuch_database_t *notmuch)
\r
151 > notmuch->last_doc_id++;
\r
153 > if (notmuch->last_doc_id == 0)
\r
154 > - INTERNAL_ERROR ("Xapian document IDs are exhausted.\n");
\r
155 > + INTERNAL_ERROR ("Xapian document IDs are exhausted.\n");
\r
157 I don't know how you got this non-change hunk here, but please remove
\r
161 > return notmuch->last_doc_id;
\r
163 > @@ -1509,7 +1513,7 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch,
\r
164 > const char **thread_id)
\r
166 > GHashTable *parents = NULL;
\r
167 > - const char *refs, *in_reply_to, *in_reply_to_message_id;
\r
168 > + const char *refs, *in_reply_to, *in_reply_to_message_id, *last_ref_message_id;
\r
169 > GList *l, *keys = NULL;
\r
170 > notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
\r
172 > @@ -1517,21 +1521,31 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch,
\r
173 > _my_talloc_free_for_g_hash, NULL);
\r
175 > refs = notmuch_message_file_get_header (message_file, "references");
\r
176 > - parse_references (message, notmuch_message_get_message_id (message),
\r
177 > - parents, refs);
\r
178 > + last_ref_message_id = parse_references (message,
\r
179 > + notmuch_message_get_message_id (message),
\r
180 > + parents, refs);
\r
182 > in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to");
\r
183 > parse_references (message, notmuch_message_get_message_id (message),
\r
184 > parents, in_reply_to);
\r
186 > - /* Carefully avoid adding any self-referential in-reply-to term. */
\r
187 > in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL);
\r
189 I wonder if you should reuse your parse_references() change here, so
\r
190 you'd set in_reply_to_message_id to the last message-id in
\r
191 In-Reply-To. This might tackle some of the problematic cases directly,
\r
192 but should still be all right per RFC 2822. I didn't verify how the
\r
193 parser handles an RFC 2822 violating free form header though.
\r
195 > + /* If the parent message ID from the Reply-To and References
\r
196 > + * headers are different, use the References one. This is because
\r
197 > + * the Reply-To header is more likely to be in an non-standard
\r
199 > + if (in_reply_to_message_id &&
\r
200 > + last_ref_message_id &&
\r
201 > + strcmp (last_ref_message_id, in_reply_to_message_id)) {
\r
202 > + in_reply_to_message_id = last_ref_message_id;
\r
205 I suggest adding an else if branch (or revamp the above if condition) to
\r
206 tackle the missing In-Reply-To header:
\r
208 else if (!in_reply_to_message_id && last_ref_message_id) {
\r
209 in_reply_to_message_id = last_ref_message_id;
\r
212 > + /* Carefully avoid adding any self-referential in-reply-to term. */
\r
213 > if (in_reply_to_message_id &&
\r
214 > strcmp (in_reply_to_message_id,
\r
215 > notmuch_message_get_message_id (message)))
\r
217 If you change parse_references() to be careful about never returning a
\r
218 self-reference, and set in_reply_to_message_id from there, I think you
\r
219 can drop the strcmp here. And move the comment to an appropriate place.
\r
221 Thanks for the patch, I think we should do this. But this is an area
\r
222 where I think we need to be careful, so another reviewer wouldn't
\r
223 harm. Some tests for this would be good too, obviously.
\r
230 > _notmuch_message_add_term (message, "replyto",
\r
231 > - _parse_message_id (message, in_reply_to, NULL));
\r
232 > + in_reply_to_message_id);
\r
235 > keys = g_hash_table_get_keys (parents);
\r
238 > _______________________________________________
\r
239 > notmuch mailing list
\r
240 > notmuch@notmuchmail.org
\r
241 > http://notmuchmail.org/mailman/listinfo/notmuch
\r