From 2e9266f88add84b1614a204e6a2c77d48bc3706d Mon Sep 17 00:00:00 2001 From: Mark Walters Date: Sun, 20 Apr 2014 08:14:56 +0100 Subject: [PATCH] [RFC PATCH] Re: excessive thread fusing --- d8/f4406e1c1d2299f414ea0f9c73e7bd114a4a74 | 165 ++++++++++++++++++++++ 1 file changed, 165 insertions(+) create mode 100644 d8/f4406e1c1d2299f414ea0f9c73e7bd114a4a74 diff --git a/d8/f4406e1c1d2299f414ea0f9c73e7bd114a4a74 b/d8/f4406e1c1d2299f414ea0f9c73e7bd114a4a74 new file mode 100644 index 000000000..28b074203 --- /dev/null +++ b/d8/f4406e1c1d2299f414ea0f9c73e7bd114a4a74 @@ -0,0 +1,165 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id B0809431FBD + for ; Sun, 20 Apr 2014 00:15:48 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: 0.502 +X-Spam-Level: +X-Spam-Status: No, score=0.502 tagged_above=-999 required=5 + tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001, + NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id xht2Z1q9FFSB for ; + Sun, 20 Apr 2014 00:15:24 -0700 (PDT) +Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6]) + (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id C1DB4431FB6 + for ; Sun, 20 Apr 2014 00:15:23 -0700 (PDT) +Received: from smtp.qmul.ac.uk ([138.37.6.40]) + by mail2.qmul.ac.uk with esmtp (Exim 4.71) + (envelope-from ) + id 1Wblxu-0008NW-Nu; Sun, 20 Apr 2014 08:15:14 +0100 +Received: from 94.196.250.77.threembb.co.uk ([94.196.250.77] helo=localhost) + by smtp.qmul.ac.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.71) + (envelope-from ) + id 1Wblxt-0006wl-ST; Sun, 20 Apr 2014 08:15:02 +0100 +From: Mark Walters +To: David Bremner , notmuch +Subject: [RFC PATCH] Re: excessive thread fusing +In-Reply-To: <87ioq5mrbz.fsf@maritornes.cs.unb.ca> +References: <87ioq5mrbz.fsf@maritornes.cs.unb.ca> +User-Agent: Notmuch/0.15.2+615~g78e3a93 (http://notmuchmail.org) Emacs/23.4.1 + (x86_64-pc-linux-gnu) +Date: Sun, 20 Apr 2014 08:14:56 +0100 +Message-ID: <87fvl8mpzj.fsf@qmul.ac.uk> +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +X-Sender-Host-Address: 94.196.250.77 +X-QM-Geographic: According to ripencc, + this message was delivered by a machine in Britain (UK) (GB). +X-QM-SPAM-Info: Sender has good ham record. :) +X-QM-Body-MD5: 71a1118bc27b9b5f9855b6c1b2fb6dba (of first 20000 bytes) +X-SpamAssassin-Score: 0.1 +X-SpamAssassin-SpamBar: / +X-SpamAssassin-Report: The QM spam filters have analysed this message to + determine if it is + spam. We require at least 5.0 points to mark a message as spam. + This message scored 0.1 points. Summary of the scoring: + * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail + provider * (markwalters1009[at]gmail.com) + * 0.1 AWL AWL: From: address is in the auto white-list +X-QM-Scan-Virus: ClamAV says the message is clean +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sun, 20 Apr 2014 07:15:48 -0000 + + +On Sat, 19 Apr 2014, David Bremner wrote: +> Gregor Zattler mentioned some problems with threading at +> +> id:20120126004024.GA13704@shi.workgroup +> +> After some off list discussions, I believe I have a smaller test case. +> +> The attached maildir contains 24 messages from the org mode list. +> +> According to notmuch, these form one thread, but I can't figure out +> exactly why. It seems like the chronologically first two messages should +> be a seperate thread. There are several of the infamous malformed ME-E +> In-reply-to headers, but each of these messages also has a References +> header; this seems to indicate a case missed by commit cf8aaafbad68. + +Hi + +I have done dome debugging of this. There is a patch below which fixes +this test case but who knows what it breaks! Please DO NOT apply unless +someone who knows this code says it's OK. + +First, the bug is quite sensitive. The attached 24 messages are numbered +and i will use the last two digits to refer to them (ie the 2 digits are +the ?? in 1397885606.0002??.mbox:2,). The number range from 17-52; 17 +and 18 should be one thread and the rest a different thread. + +1) If you add all messages you get one thread. +2) If you add all apart from 52 you get 2 threads. However, then adding +52 still gives two threads. +3) If you add 18 and then 52 you get 1 thread. +4) If you add 17 and 18 then 52 you get 2 threads. + +I think notmuch will use inode sort and since the tar file contains +these three files in the order 18 52 17 we get cases 1 and 2 above. + +I put some debug stuff in _notmuch_database_link_message_to_parents and +I think that the problem comes from the call to parse_references on line +1767 which adds the malformed in-reply-to header to the hash table, so +this malformed line gets added as a potential parent. + +As a clear example that I don't understand this code I don't know why +this no longer causes a problem if message 17 gets added too. + +Best wishes + +Mark + +--- + lib/database.cc | 21 ++++++++++++--------- + 1 file changed, 12 insertions(+), 9 deletions(-) + +diff --git a/lib/database.cc b/lib/database.cc +index 1efb14d..373a255 100644 +--- a/lib/database.cc ++++ b/lib/database.cc +@@ -1763,20 +1763,23 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, + this_message_id, + parents, refs); + +- in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); +- in_reply_to_message_id = parse_references (message, +- this_message_id, +- parents, in_reply_to); +- + /* For the parent of this message, use the last message ID of the + * References header, if available. If not, fall back to the +- * first message ID in the In-Reply-To header. */ ++ * first message ID in the In-Reply-To header. We only parse the ++ * In-Reply-To header if we need to as otherwise we might ++ * contanimate the hash table if it is malformed. */ + if (last_ref_message_id) { + _notmuch_message_add_term (message, "replyto", + last_ref_message_id); +- } else if (in_reply_to_message_id) { +- _notmuch_message_add_term (message, "replyto", +- in_reply_to_message_id); ++ } else { ++ in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); ++ in_reply_to_message_id = parse_references (message, ++ this_message_id, ++ parents, in_reply_to); ++ if (in_reply_to_message_id) { ++ _notmuch_message_add_term (message, "replyto", ++ in_reply_to_message_id); ++ } + } + + keys = g_hash_table_get_keys (parents); +-- +1.7.10.4 + + + + -- 2.26.2