Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 80B49431FB6 for ; Sat, 4 May 2013 09:25:11 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KN1iJ2ohgAsp for ; Sat, 4 May 2013 09:25:08 -0700 (PDT) Received: from mail-lb0-f174.google.com (mail-lb0-f174.google.com [209.85.217.174]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id AB4F7431FAF for ; Sat, 4 May 2013 09:25:07 -0700 (PDT) Received: by mail-lb0-f174.google.com with SMTP id r10so2318674lbi.5 for ; Sat, 04 May 2013 09:25:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:to:subject:in-reply-to:references:user-agent:date :message-id:mime-version:content-type:x-gm-message-state; bh=R672KHlbP2lzZVHtqc2ZQ57gzOm4SUtAvJ5jEasxLeA=; b=f4T19bNSCNgx3IfBCHaRlgtolA6/E+R3BFEy013Mi3VptxQLHMQjRKPy1zUGhxNkP+ biB90VW4WwSDIT052kCX08PtPN0vW5VTNfjHPsqhWw70rjJ+vZLgLsYDLbYOPeBC1zjm 4mhoPYVbIhD4ZQF2Da1T4x04dS7Jcs2LkHFJqWxODBzWu0OrL5067P9FyExqMX6226KM nxYjoMi9/Fmm9YRHp6RzNGqCDuUP4d9Tn2nO+lE6Uh5KUf9moI8QZnzNUbUPIsgy9Nl7 yuVFvXFpIfBRtI65j7zX/4vkXpstWr1S+w71Feubbf6V8KuAT2MOP00OzE16XvqqgHZn KI1w== X-Received: by 10.152.8.231 with SMTP id u7mr5758551laa.27.1367684704832; Sat, 04 May 2013 09:25:04 -0700 (PDT) Received: from localhost (dsl-hkibrasgw2-58c376-211.dhcp.inet.fi. [88.195.118.211]) by mx.google.com with ESMTPSA id l20sm5845965lbv.9.2013.05.04.09.25.03 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 04 May 2013 09:25:03 -0700 (PDT) From: Jani Nikula To: Aaron Ecay , notmuch@notmuchmail.org Subject: Re: [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated In-Reply-To: <1362540709-28765-2-git-send-email-aaronecay@gmail.com> References: <87ppzfzxuk.fsf@zancas.localnet> <1362540709-28765-1-git-send-email-aaronecay@gmail.com> <1362540709-28765-2-git-send-email-aaronecay@gmail.com> User-Agent: Notmuch/0.15.2+87~gc69f540 (http://notmuchmail.org) Emacs/24.3.1 (x86_64-pc-linux-gnu) Date: Sat, 04 May 2013 19:24:59 +0300 Message-ID: <87r4hm1zms.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain X-Gm-Message-State: ALoCoQn1HSv9xROPnMaZdwmP/cOSGDUHSV62N7YI2hWxXs95xfFv4cUbnbcUxRmsHiQBsR0oDDHu X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 May 2013 16:25:11 -0000 LGTM On Wed, 06 Mar 2013, Aaron Ecay wrote: > Presently, the code which finds the parent of a message as it is being > added to the database assumes that the first Message-ID-like substring > of the In-Reply-To header is the parent Message ID. Some mail clients, > however, put stuff other than the Message-ID of the parent in the > In-Reply-To header, such as the email address of the sender of the > parent. This can fool notmuch. > > The updated algorithm prefers the last Message ID in the References > header. The References header lists messages oldest-first, so the last > Message ID is the parent (RFC2822, p. 24). The References header is > also less likely to be in a non-standard > syntax (http://cr.yp.to/immhf/thread.html, > http://www.jwz.org/doc/threading.html). In case the References header > is not to be found, fall back to the old behavior. > > V2 of this patch, incorporating feedback from Jani and (indirectly) > Austin. > --- > lib/database.cc | 48 +++++++++++++++++++++++++++++++++--------------- > test/thread-replies | 4 ---- > 2 files changed, 33 insertions(+), 19 deletions(-) > > diff --git a/lib/database.cc b/lib/database.cc > index 91d4329..52ed618 100644 > --- a/lib/database.cc > +++ b/lib/database.cc > @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next) > * 'message_id' in the result (to avoid mass confusion when a single > * message references itself cyclically---and yes, mail messages are > * not infrequent in the wild that do this---don't ask me why). > -*/ > -static void > + * > + * Return the last reference parsed, if it is not equal to message_id. > + */ > +static char * > parse_references (void *ctx, > const char *message_id, > GHashTable *hash, > @@ -511,7 +513,7 @@ parse_references (void *ctx, > char *ref; > > if (refs == NULL || *refs == '\0') > - return; > + return NULL; > > while (*refs) { > ref = _parse_message_id (ctx, refs, &refs); > @@ -519,6 +521,17 @@ parse_references (void *ctx, > if (ref && strcmp (ref, message_id)) > g_hash_table_insert (hash, ref, NULL); > } > + > + /* The return value of this function is used to add a parent > + * reference to the database. We should avoid making a message > + * its own parent, thus the following check. > + */ > + > + if (ref && strcmp(ref, message_id)) { > + return ref; > + } else { > + return NULL; > + } > } > > notmuch_status_t > @@ -1510,28 +1523,33 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, > { > GHashTable *parents = NULL; > const char *refs, *in_reply_to, *in_reply_to_message_id; > + const char *last_ref_message_id, *this_message_id; > GList *l, *keys = NULL; > notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; > > parents = g_hash_table_new_full (g_str_hash, g_str_equal, > _my_talloc_free_for_g_hash, NULL); > + this_message_id = notmuch_message_get_message_id (message); > > refs = notmuch_message_file_get_header (message_file, "references"); > - parse_references (message, notmuch_message_get_message_id (message), > - parents, refs); > + last_ref_message_id = parse_references (message, > + this_message_id, > + parents, refs); > > in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); > - parse_references (message, notmuch_message_get_message_id (message), > - parents, in_reply_to); > - > - /* Carefully avoid adding any self-referential in-reply-to term. */ > - in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL); > - if (in_reply_to_message_id && > - strcmp (in_reply_to_message_id, > - notmuch_message_get_message_id (message))) > - { > + in_reply_to_message_id = parse_references (message, > + this_message_id, > + parents, in_reply_to); > + > + /* For the parent of this message, use the last message ID of the > + * References header, if available. If not, fall back to the > + * first message ID in the In-Reply-To header. */ > + if (last_ref_message_id) { > + _notmuch_message_add_term (message, "replyto", > + last_ref_message_id); > + } else if (in_reply_to_message_id) { > _notmuch_message_add_term (message, "replyto", > - _parse_message_id (message, in_reply_to, NULL)); > + in_reply_to_message_id); > } > > keys = g_hash_table_get_keys (parents); > diff --git a/test/thread-replies b/test/thread-replies > index a902691..28c2b1f 100755 > --- a/test/thread-replies > +++ b/test/thread-replies > @@ -11,7 +11,6 @@ constructed properly, even in the presence of non-RFC-compliant headers' > . ./test-lib.sh > > test_begin_subtest "Use References when In-Reply-To is broken" > -test_subtest_known_broken > add_message '[id]="foo@one.com"' \ > '[subject]=one' > add_message '[in-reply-to]="mumble"' \ > @@ -46,7 +45,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` > test_expect_equal_json "$output" "$expected" > > test_begin_subtest "Prefer References to In-Reply-To" > -test_subtest_known_broken > add_message '[id]="foo@two.com"' \ > '[subject]=two' > add_message '[in-reply-to]=""' \ > @@ -77,7 +75,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` > test_expect_equal_json "$output" "$expected" > > test_begin_subtest "Use In-Reply-To when no References" > -test_subtest_known_broken > add_message '[id]="foo@three.com"' \ > '[subject]="three"' > add_message '[in-reply-to]=""' \ > @@ -104,7 +101,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` > test_expect_equal_json "$output" "$expected" > > test_begin_subtest "Use last Reference" > -test_subtest_known_broken > add_message '[id]="foo@four.com"' \ > '[subject]="four"' > add_message '[id]="bar@four.com"' \ > -- > 1.8.1.5 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch