Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 197C1431FC0; Fri, 18 Dec 2009 11:41:20 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iEr14TekMYht; Fri, 18 Dec 2009 11:41:19 -0800 (PST) Received: from yoom.home.cworth.org (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 108B1431FAE; Fri, 18 Dec 2009 11:41:19 -0800 (PST) Received: by yoom.home.cworth.org (Postfix, from userid 1000) id 3C7EE254306; Fri, 18 Dec 2009 11:41:19 -0800 (PST) From: Carl Worth To: James Westby , notmuch@notmuchmail.org In-Reply-To: <87oclwrtqa.fsf@jameswestby.net> References: <87oclwrtqa.fsf@jameswestby.net> Date: Fri, 18 Dec 2009 11:41:18 -0800 Message-ID: <874onoysrl.fsf@yoom.home.cworth.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" Subject: Re: [notmuch] Missing messages breaking threads X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Dec 2009 19:41:20 -0000 --=-=-= Content-Transfer-Encoding: quoted-printable On Fri, 18 Dec 2009 19:02:21 +0000, James Westby wrote: > I like the architecture of notmuch, and have just switched > to using it as my primary client, so thanks. You're quite welcome, James. Welcome to notmuch! > Therefore I'd like to fix this. The obvious way is to > introduce documents in to the db for each id we see, and > threading should then naturally work better. That sounds like a fine idea. > The only issue I see with doing this is with mail delays. > Once we do this we will sometimes receive a message that > already has a dummy document. What happens currently with > message-id collisions? The current message-ID collision logic is pretty brain-dead. It just says "Oh, I've seen a file with this message before, so I'll skip this additional file". But I'm just putting the finishing touches on a patch that instead does: Oh, and here's an additional filename for that message ID. Add that too, please. Beyond that, all we would need to do as well is to also index the new content. I don't want to do useless re-indexing when files just get renamed. So maybe all we need to do is to save the filesize of the last-indexed file for a document and then when we encounter a file with the same message ID and a larger file size, then index it as well? That would even take care of providing the opportunity to index additional mailing-list-added content for messages also sent directly via CC. The file-size heuristic wouldn't be perfect for these other cases. I guess we save a list of sha-1 sums for indexed files or so, (assuming that's cheaper than just re-indexing---before the Xapian Defect 250 fix I'm sure it is, but after I'm not sure---we maybe should just always re-index---but I think I have seen the TermGenerator appear in profiles of indexing runs.) > * When we get a message-id conflict check for dummy:True > and replace the document if it is there. >=20 > How does this sound? That sounds fine. It's the same as what I propose above with "filesize:0" instead of "dummy:true". > There could be an issue with synthesising too many threads > and then ending up having to try and put a message in two > threads? I see there is code for merging threads, would that > handle this? It should, yes. The current logic is that a message can only appear in a single thread. So if a message has children or parents with distinct thread IDs then those threads are merged. I can imagine some strange cross-posting scenario where one could argue that the merging shouldn't happen, but I'm not sure we want to try to respect that. =2DCarl --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iD8DBQFLK9rf6JDdNq8qSWgRAsEOAJ46ZZge6u1bb3kjJEvP+2Y0GIdSmQCeMLZz kyq2cq6LrJ4Q4UAuaBn2Fac= =Isxu -----END PGP SIGNATURE----- --=-=-=--