--- /dev/null
+Return-Path: <jani@nikula.org>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by olra.theworths.org (Postfix) with ESMTP id 80B49431FB6\r
+ for <notmuch@notmuchmail.org>; Sat, 4 May 2013 09:25:11 -0700 (PDT)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.7\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5\r
+ tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+ by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id KN1iJ2ohgAsp for <notmuch@notmuchmail.org>;\r
+ Sat, 4 May 2013 09:25:08 -0700 (PDT)\r
+Received: from mail-lb0-f174.google.com (mail-lb0-f174.google.com\r
+ [209.85.217.174]) (using TLSv1 with cipher RC4-SHA (128/128 bits))\r
+ (No client certificate requested)\r
+ by olra.theworths.org (Postfix) with ESMTPS id AB4F7431FAF\r
+ for <notmuch@notmuchmail.org>; Sat, 4 May 2013 09:25:07 -0700 (PDT)\r
+Received: by mail-lb0-f174.google.com with SMTP id r10so2318674lbi.5\r
+ for <notmuch@notmuchmail.org>; Sat, 04 May 2013 09:25:04 -0700 (PDT)\r
+X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\r
+ d=google.com; s=20120113;\r
+ h=x-received:from:to:subject:in-reply-to:references:user-agent:date\r
+ :message-id:mime-version:content-type:x-gm-message-state;\r
+ bh=R672KHlbP2lzZVHtqc2ZQ57gzOm4SUtAvJ5jEasxLeA=;\r
+ b=f4T19bNSCNgx3IfBCHaRlgtolA6/E+R3BFEy013Mi3VptxQLHMQjRKPy1zUGhxNkP+\r
+ biB90VW4WwSDIT052kCX08PtPN0vW5VTNfjHPsqhWw70rjJ+vZLgLsYDLbYOPeBC1zjm\r
+ 4mhoPYVbIhD4ZQF2Da1T4x04dS7Jcs2LkHFJqWxODBzWu0OrL5067P9FyExqMX6226KM\r
+ nxYjoMi9/Fmm9YRHp6RzNGqCDuUP4d9Tn2nO+lE6Uh5KUf9moI8QZnzNUbUPIsgy9Nl7\r
+ yuVFvXFpIfBRtI65j7zX/4vkXpstWr1S+w71Feubbf6V8KuAT2MOP00OzE16XvqqgHZn\r
+ KI1w==\r
+X-Received: by 10.152.8.231 with SMTP id u7mr5758551laa.27.1367684704832;\r
+ Sat, 04 May 2013 09:25:04 -0700 (PDT)\r
+Received: from localhost (dsl-hkibrasgw2-58c376-211.dhcp.inet.fi.\r
+ [88.195.118.211])\r
+ by mx.google.com with ESMTPSA id l20sm5845965lbv.9.2013.05.04.09.25.03\r
+ for <multiple recipients>\r
+ (version=TLSv1.2 cipher=RC4-SHA bits=128/128);\r
+ Sat, 04 May 2013 09:25:03 -0700 (PDT)\r
+From: Jani Nikula <jani@nikula.org>\r
+To: Aaron Ecay <aaronecay@gmail.com>, notmuch@notmuchmail.org\r
+Subject: Re: [PATCH 2/2] lib/database.cc: change how the parent of a message\r
+ is calculated\r
+In-Reply-To: <1362540709-28765-2-git-send-email-aaronecay@gmail.com>\r
+References: <87ppzfzxuk.fsf@zancas.localnet>\r
+ <1362540709-28765-1-git-send-email-aaronecay@gmail.com>\r
+ <1362540709-28765-2-git-send-email-aaronecay@gmail.com>\r
+User-Agent: Notmuch/0.15.2+87~gc69f540 (http://notmuchmail.org) Emacs/24.3.1\r
+ (x86_64-pc-linux-gnu)\r
+Date: Sat, 04 May 2013 19:24:59 +0300\r
+Message-ID: <87r4hm1zms.fsf@nikula.org>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain\r
+X-Gm-Message-State:\r
+ ALoCoQn1HSv9xROPnMaZdwmP/cOSGDUHSV62N7YI2hWxXs95xfFv4cUbnbcUxRmsHiQBsR0oDDHu\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Sat, 04 May 2013 16:25:11 -0000\r
+\r
+\r
+LGTM\r
+\r
+On Wed, 06 Mar 2013, Aaron Ecay <aaronecay@gmail.com> wrote:\r
+> Presently, the code which finds the parent of a message as it is being\r
+> added to the database assumes that the first Message-ID-like substring\r
+> of the In-Reply-To header is the parent Message ID. Some mail clients,\r
+> however, put stuff other than the Message-ID of the parent in the\r
+> In-Reply-To header, such as the email address of the sender of the\r
+> parent. This can fool notmuch.\r
+>\r
+> The updated algorithm prefers the last Message ID in the References\r
+> header. The References header lists messages oldest-first, so the last\r
+> Message ID is the parent (RFC2822, p. 24). The References header is\r
+> also less likely to be in a non-standard\r
+> syntax (http://cr.yp.to/immhf/thread.html,\r
+> http://www.jwz.org/doc/threading.html). In case the References header\r
+> is not to be found, fall back to the old behavior.\r
+>\r
+> V2 of this patch, incorporating feedback from Jani and (indirectly)\r
+> Austin.\r
+> ---\r
+> lib/database.cc | 48 +++++++++++++++++++++++++++++++++---------------\r
+> test/thread-replies | 4 ----\r
+> 2 files changed, 33 insertions(+), 19 deletions(-)\r
+>\r
+> diff --git a/lib/database.cc b/lib/database.cc\r
+> index 91d4329..52ed618 100644\r
+> --- a/lib/database.cc\r
+> +++ b/lib/database.cc\r
+> @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next)\r
+> * 'message_id' in the result (to avoid mass confusion when a single\r
+> * message references itself cyclically---and yes, mail messages are\r
+> * not infrequent in the wild that do this---don't ask me why).\r
+> -*/\r
+> -static void\r
+> + *\r
+> + * Return the last reference parsed, if it is not equal to message_id.\r
+> + */\r
+> +static char *\r
+> parse_references (void *ctx,\r
+> const char *message_id,\r
+> GHashTable *hash,\r
+> @@ -511,7 +513,7 @@ parse_references (void *ctx,\r
+> char *ref;\r
+> \r
+> if (refs == NULL || *refs == '\0')\r
+> - return;\r
+> + return NULL;\r
+> \r
+> while (*refs) {\r
+> ref = _parse_message_id (ctx, refs, &refs);\r
+> @@ -519,6 +521,17 @@ parse_references (void *ctx,\r
+> if (ref && strcmp (ref, message_id))\r
+> g_hash_table_insert (hash, ref, NULL);\r
+> }\r
+> +\r
+> + /* The return value of this function is used to add a parent\r
+> + * reference to the database. We should avoid making a message\r
+> + * its own parent, thus the following check.\r
+> + */\r
+> +\r
+> + if (ref && strcmp(ref, message_id)) {\r
+> + return ref;\r
+> + } else {\r
+> + return NULL;\r
+> + }\r
+> }\r
+> \r
+> notmuch_status_t\r
+> @@ -1510,28 +1523,33 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch,\r
+> {\r
+> GHashTable *parents = NULL;\r
+> const char *refs, *in_reply_to, *in_reply_to_message_id;\r
+> + const char *last_ref_message_id, *this_message_id;\r
+> GList *l, *keys = NULL;\r
+> notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;\r
+> \r
+> parents = g_hash_table_new_full (g_str_hash, g_str_equal,\r
+> _my_talloc_free_for_g_hash, NULL);\r
+> + this_message_id = notmuch_message_get_message_id (message);\r
+> \r
+> refs = notmuch_message_file_get_header (message_file, "references");\r
+> - parse_references (message, notmuch_message_get_message_id (message),\r
+> - parents, refs);\r
+> + last_ref_message_id = parse_references (message,\r
+> + this_message_id,\r
+> + parents, refs);\r
+> \r
+> in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to");\r
+> - parse_references (message, notmuch_message_get_message_id (message),\r
+> - parents, in_reply_to);\r
+> -\r
+> - /* Carefully avoid adding any self-referential in-reply-to term. */\r
+> - in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL);\r
+> - if (in_reply_to_message_id &&\r
+> - strcmp (in_reply_to_message_id,\r
+> - notmuch_message_get_message_id (message)))\r
+> - {\r
+> + in_reply_to_message_id = parse_references (message,\r
+> + this_message_id,\r
+> + parents, in_reply_to);\r
+> +\r
+> + /* For the parent of this message, use the last message ID of the\r
+> + * References header, if available. If not, fall back to the\r
+> + * first message ID in the In-Reply-To header. */\r
+> + if (last_ref_message_id) {\r
+> + _notmuch_message_add_term (message, "replyto",\r
+> + last_ref_message_id);\r
+> + } else if (in_reply_to_message_id) {\r
+> _notmuch_message_add_term (message, "replyto",\r
+> - _parse_message_id (message, in_reply_to, NULL));\r
+> + in_reply_to_message_id);\r
+> }\r
+> \r
+> keys = g_hash_table_get_keys (parents);\r
+> diff --git a/test/thread-replies b/test/thread-replies\r
+> index a902691..28c2b1f 100755\r
+> --- a/test/thread-replies\r
+> +++ b/test/thread-replies\r
+> @@ -11,7 +11,6 @@ constructed properly, even in the presence of non-RFC-compliant headers'\r
+> . ./test-lib.sh\r
+> \r
+> test_begin_subtest "Use References when In-Reply-To is broken"\r
+> -test_subtest_known_broken\r
+> add_message '[id]="foo@one.com"' \\r
+> '[subject]=one'\r
+> add_message '[in-reply-to]="mumble"' \\r
+> @@ -46,7 +45,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize`\r
+> test_expect_equal_json "$output" "$expected"\r
+> \r
+> test_begin_subtest "Prefer References to In-Reply-To"\r
+> -test_subtest_known_broken\r
+> add_message '[id]="foo@two.com"' \\r
+> '[subject]=two'\r
+> add_message '[in-reply-to]="<bar@baz.com>"' \\r
+> @@ -77,7 +75,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize`\r
+> test_expect_equal_json "$output" "$expected"\r
+> \r
+> test_begin_subtest "Use In-Reply-To when no References"\r
+> -test_subtest_known_broken\r
+> add_message '[id]="foo@three.com"' \\r
+> '[subject]="three"'\r
+> add_message '[in-reply-to]="<foo@three.com>"' \\r
+> @@ -104,7 +101,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize`\r
+> test_expect_equal_json "$output" "$expected"\r
+> \r
+> test_begin_subtest "Use last Reference"\r
+> -test_subtest_known_broken\r
+> add_message '[id]="foo@four.com"' \\r
+> '[subject]="four"'\r
+> add_message '[id]="bar@four.com"' \\r
+> -- \r
+> 1.8.1.5\r
+>\r
+> _______________________________________________\r
+> notmuch mailing list\r
+> notmuch@notmuchmail.org\r
+> http://notmuchmail.org/mailman/listinfo/notmuch\r