From: Jani Nikula Date: Sat, 4 May 2013 16:24:59 +0000 (+0300) Subject: Re: [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=bc791f161a4abcd66147244a9b7e3e104bd46a9d;p=notmuch-archives.git Re: [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated --- diff --git a/0a/b4774d1988ea23baabd033a9813822c0e097fe b/0a/b4774d1988ea23baabd033a9813822c0e097fe new file mode 100644 index 000000000..5ac2d181c --- /dev/null +++ b/0a/b4774d1988ea23baabd033a9813822c0e097fe @@ -0,0 +1,231 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 80B49431FB6 + for ; Sat, 4 May 2013 09:25:11 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -0.7 +X-Spam-Level: +X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id KN1iJ2ohgAsp for ; + Sat, 4 May 2013 09:25:08 -0700 (PDT) +Received: from mail-lb0-f174.google.com (mail-lb0-f174.google.com + [209.85.217.174]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id AB4F7431FAF + for ; Sat, 4 May 2013 09:25:07 -0700 (PDT) +Received: by mail-lb0-f174.google.com with SMTP id r10so2318674lbi.5 + for ; Sat, 04 May 2013 09:25:04 -0700 (PDT) +X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=google.com; s=20120113; + h=x-received:from:to:subject:in-reply-to:references:user-agent:date + :message-id:mime-version:content-type:x-gm-message-state; + bh=R672KHlbP2lzZVHtqc2ZQ57gzOm4SUtAvJ5jEasxLeA=; + b=f4T19bNSCNgx3IfBCHaRlgtolA6/E+R3BFEy013Mi3VptxQLHMQjRKPy1zUGhxNkP+ + biB90VW4WwSDIT052kCX08PtPN0vW5VTNfjHPsqhWw70rjJ+vZLgLsYDLbYOPeBC1zjm + 4mhoPYVbIhD4ZQF2Da1T4x04dS7Jcs2LkHFJqWxODBzWu0OrL5067P9FyExqMX6226KM + nxYjoMi9/Fmm9YRHp6RzNGqCDuUP4d9Tn2nO+lE6Uh5KUf9moI8QZnzNUbUPIsgy9Nl7 + yuVFvXFpIfBRtI65j7zX/4vkXpstWr1S+w71Feubbf6V8KuAT2MOP00OzE16XvqqgHZn + KI1w== +X-Received: by 10.152.8.231 with SMTP id u7mr5758551laa.27.1367684704832; + Sat, 04 May 2013 09:25:04 -0700 (PDT) +Received: from localhost (dsl-hkibrasgw2-58c376-211.dhcp.inet.fi. + [88.195.118.211]) + by mx.google.com with ESMTPSA id l20sm5845965lbv.9.2013.05.04.09.25.03 + for + (version=TLSv1.2 cipher=RC4-SHA bits=128/128); + Sat, 04 May 2013 09:25:03 -0700 (PDT) +From: Jani Nikula +To: Aaron Ecay , notmuch@notmuchmail.org +Subject: Re: [PATCH 2/2] lib/database.cc: change how the parent of a message + is calculated +In-Reply-To: <1362540709-28765-2-git-send-email-aaronecay@gmail.com> +References: <87ppzfzxuk.fsf@zancas.localnet> + <1362540709-28765-1-git-send-email-aaronecay@gmail.com> + <1362540709-28765-2-git-send-email-aaronecay@gmail.com> +User-Agent: Notmuch/0.15.2+87~gc69f540 (http://notmuchmail.org) Emacs/24.3.1 + (x86_64-pc-linux-gnu) +Date: Sat, 04 May 2013 19:24:59 +0300 +Message-ID: <87r4hm1zms.fsf@nikula.org> +MIME-Version: 1.0 +Content-Type: text/plain +X-Gm-Message-State: + ALoCoQn1HSv9xROPnMaZdwmP/cOSGDUHSV62N7YI2hWxXs95xfFv4cUbnbcUxRmsHiQBsR0oDDHu +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sat, 04 May 2013 16:25:11 -0000 + + +LGTM + +On Wed, 06 Mar 2013, Aaron Ecay wrote: +> Presently, the code which finds the parent of a message as it is being +> added to the database assumes that the first Message-ID-like substring +> of the In-Reply-To header is the parent Message ID. Some mail clients, +> however, put stuff other than the Message-ID of the parent in the +> In-Reply-To header, such as the email address of the sender of the +> parent. This can fool notmuch. +> +> The updated algorithm prefers the last Message ID in the References +> header. The References header lists messages oldest-first, so the last +> Message ID is the parent (RFC2822, p. 24). The References header is +> also less likely to be in a non-standard +> syntax (http://cr.yp.to/immhf/thread.html, +> http://www.jwz.org/doc/threading.html). In case the References header +> is not to be found, fall back to the old behavior. +> +> V2 of this patch, incorporating feedback from Jani and (indirectly) +> Austin. +> --- +> lib/database.cc | 48 +++++++++++++++++++++++++++++++++--------------- +> test/thread-replies | 4 ---- +> 2 files changed, 33 insertions(+), 19 deletions(-) +> +> diff --git a/lib/database.cc b/lib/database.cc +> index 91d4329..52ed618 100644 +> --- a/lib/database.cc +> +++ b/lib/database.cc +> @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next) +> * 'message_id' in the result (to avoid mass confusion when a single +> * message references itself cyclically---and yes, mail messages are +> * not infrequent in the wild that do this---don't ask me why). +> -*/ +> -static void +> + * +> + * Return the last reference parsed, if it is not equal to message_id. +> + */ +> +static char * +> parse_references (void *ctx, +> const char *message_id, +> GHashTable *hash, +> @@ -511,7 +513,7 @@ parse_references (void *ctx, +> char *ref; +> +> if (refs == NULL || *refs == '\0') +> - return; +> + return NULL; +> +> while (*refs) { +> ref = _parse_message_id (ctx, refs, &refs); +> @@ -519,6 +521,17 @@ parse_references (void *ctx, +> if (ref && strcmp (ref, message_id)) +> g_hash_table_insert (hash, ref, NULL); +> } +> + +> + /* The return value of this function is used to add a parent +> + * reference to the database. We should avoid making a message +> + * its own parent, thus the following check. +> + */ +> + +> + if (ref && strcmp(ref, message_id)) { +> + return ref; +> + } else { +> + return NULL; +> + } +> } +> +> notmuch_status_t +> @@ -1510,28 +1523,33 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, +> { +> GHashTable *parents = NULL; +> const char *refs, *in_reply_to, *in_reply_to_message_id; +> + const char *last_ref_message_id, *this_message_id; +> GList *l, *keys = NULL; +> notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; +> +> parents = g_hash_table_new_full (g_str_hash, g_str_equal, +> _my_talloc_free_for_g_hash, NULL); +> + this_message_id = notmuch_message_get_message_id (message); +> +> refs = notmuch_message_file_get_header (message_file, "references"); +> - parse_references (message, notmuch_message_get_message_id (message), +> - parents, refs); +> + last_ref_message_id = parse_references (message, +> + this_message_id, +> + parents, refs); +> +> in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); +> - parse_references (message, notmuch_message_get_message_id (message), +> - parents, in_reply_to); +> - +> - /* Carefully avoid adding any self-referential in-reply-to term. */ +> - in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL); +> - if (in_reply_to_message_id && +> - strcmp (in_reply_to_message_id, +> - notmuch_message_get_message_id (message))) +> - { +> + in_reply_to_message_id = parse_references (message, +> + this_message_id, +> + parents, in_reply_to); +> + +> + /* For the parent of this message, use the last message ID of the +> + * References header, if available. If not, fall back to the +> + * first message ID in the In-Reply-To header. */ +> + if (last_ref_message_id) { +> + _notmuch_message_add_term (message, "replyto", +> + last_ref_message_id); +> + } else if (in_reply_to_message_id) { +> _notmuch_message_add_term (message, "replyto", +> - _parse_message_id (message, in_reply_to, NULL)); +> + in_reply_to_message_id); +> } +> +> keys = g_hash_table_get_keys (parents); +> diff --git a/test/thread-replies b/test/thread-replies +> index a902691..28c2b1f 100755 +> --- a/test/thread-replies +> +++ b/test/thread-replies +> @@ -11,7 +11,6 @@ constructed properly, even in the presence of non-RFC-compliant headers' +> . ./test-lib.sh +> +> test_begin_subtest "Use References when In-Reply-To is broken" +> -test_subtest_known_broken +> add_message '[id]="foo@one.com"' \ +> '[subject]=one' +> add_message '[in-reply-to]="mumble"' \ +> @@ -46,7 +45,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` +> test_expect_equal_json "$output" "$expected" +> +> test_begin_subtest "Prefer References to In-Reply-To" +> -test_subtest_known_broken +> add_message '[id]="foo@two.com"' \ +> '[subject]=two' +> add_message '[in-reply-to]=""' \ +> @@ -77,7 +75,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` +> test_expect_equal_json "$output" "$expected" +> +> test_begin_subtest "Use In-Reply-To when no References" +> -test_subtest_known_broken +> add_message '[id]="foo@three.com"' \ +> '[subject]="three"' +> add_message '[in-reply-to]=""' \ +> @@ -104,7 +101,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` +> test_expect_equal_json "$output" "$expected" +> +> test_begin_subtest "Use last Reference" +> -test_subtest_known_broken +> add_message '[id]="foo@four.com"' \ +> '[subject]="four"' +> add_message '[id]="bar@four.com"' \ +> -- +> 1.8.1.5 +> +> _______________________________________________ +> notmuch mailing list +> notmuch@notmuchmail.org +> http://notmuchmail.org/mailman/listinfo/notmuch