Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id ACE96431FC0; Sat, 28 Nov 2009 10:16:12 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lpDPbnIF+3xu; Sat, 28 Nov 2009 10:16:12 -0800 (PST) Received: from cworth.org (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 0BF30431FAE; Sat, 28 Nov 2009 10:16:11 -0800 (PST) From: Carl Worth To: David Bremner , notmuch@notmuchmail.org In-Reply-To: <87k4xbkmwy.wl%bremner@pivot.cs.unb.ca> References: <87k4xbkmwy.wl%bremner@pivot.cs.unb.ca> Date: Sat, 28 Nov 2009 10:15:57 -0800 Message-ID: <87tyweplb6.fsf@yoom.home.cworth.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" Subject: Re: [notmuch] Duplicate In-reply-to line 326 lib/message.cc X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Nov 2009 18:16:12 -0000 --=-=-= Content-Transfer-Encoding: quoted-printable On Sat, 28 Nov 2009 05:40:13 -0400, David Bremner = wrote: > Now it seems that any search that is non-empty (i.e. matches > something) crashes with a duplicate In-Reply-To ID. This is in git > revision 92c4dcc (although it was the same yesterday). The oddest > thing is that the second message-id is a common English word. ... > Internal error: Message 877htzhn9e.wl%jemarch@gnu.org has duplicate In-Re= ply-To IDs: 1e5bcefd0911081424p12eb6fa9te57ff4cfeb83fcdd@mail.gmail.com and= data > (lib/message.cc:326). Thanks David, I replicated this without any difficulty. And the fix was to just correct a stupid mistake on my part. The only reason I hadn't noticed this myself earlier is that I've been doing debug builds with: make CFLAGS=3D"-g -DDEBUG" instead of: make CFLAGS=3D"-g -DDEBUG" CXXFLAGS=3D"-g -DDEBUG" If we can, I'd like to see about making the former work, to avoid hiding things like this in the future. > At the moment I don't have any real good ideas for how to debug this > (or any real familiarity with notmuch internals). I put a test corpus > of messages (all from public mailing lists) at Before I realized how easy the bug was to replicate and fix, I was going to give a couple of debugging ideas here. I guess I'll briefly mention things anyway. The core of what we store in the database for each message is a single list of "terms", (each a string of text). We use different terms for different purposes by prefixing some with particular sub-strings. See the large comment at the top of lib/database.cc for some details on this. So if there *were* an actual case of a duplicate In-Reply-To term here, the first thing to do would be to inspect the actual terms in the database for the document of the message of interest. Up until now, what I've been using for this is a little utility I wrote called xapian-dump. It exists deep in the code history of notmuch. So one could use git log to find the commit that removed it and then check out the commit before that to get the utility. But xapian-dump is pretty dumb and all it does is dump all terms from all documents in the database, (it also dumps all the data and values From=20those documents, but we're not talking about those parts here). So that's a *lot* of output. More interesting would be a tool to dump just the terms from the message you're wanting to debug. So that's why I want to introduce a new "notmuch search --for=3Dterms" or so to have a much more useful debugging tool. Anyway, I hope that was informative. Thanks for reporting the bug! =2DCarl commit 64c8d6227a90ea6c37ea112ee20b14f16b9b46e7 Author: Carl Worth Date: Sat Nov 28 10:01:22 2009 -0800 Avoid bogus internal error reporting duplicate In-Reply-To IDs. =20=20=20=20 This error was tirggered with a debugging build via: =20=20=20=20 make CXXFLAGS=3D"-DDEBUG" =20=20=20=20 and reported by David Bremner. The actual error is that I'm an idiot that doesn't know how to use strcmp's return value. Of course, the strcmp interface scores a negative 7 on Rusty Russell ranking of bad interfaces: =20=20=20=20 http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html diff --git a/lib/message.cc b/lib/message.cc index 03b8c81..49519f1 100644 =2D-- a/lib/message.cc +++ b/lib/message.cc @@ -318,7 +318,7 @@ _notmuch_message_get_in_reply_to (notmuch_message_t *me= ssage in_reply_to =3D *i; =20 if (i !=3D message->doc.termlist_end () && =2D strncmp ((*i).c_str (), prefix, prefix_len)) + strncmp ((*i).c_str (), prefix, prefix_len) =3D=3D 0) { INTERNAL_ERROR ("Message %s has duplicate In-Reply-To IDs: %s and %= s\n", notmuch_message_get_message_id (message), --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iD8DBQFLEWjd6JDdNq8qSWgRAqooAJsF9ViBdIGGYFWIqOfWHd1jzMjbyACePfHB zRIow85fDP3m4cK2PTUm+88= =1j5B -----END PGP SIGNATURE----- --=-=-=--