Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 73808431FD0 for ; Tue, 25 Jan 2011 16:51:17 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.798 X-Spam-Level: X-Spam-Status: No, score=-0.798 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id v3Qty8U8dcG4 for ; Tue, 25 Jan 2011 16:51:15 -0800 (PST) Received: from mail-qw0-f53.google.com (mail-qw0-f53.google.com [209.85.216.53]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 4E89B431FB6 for ; Tue, 25 Jan 2011 16:51:15 -0800 (PST) Received: by qwe5 with SMTP id 5so423307qwe.26 for ; Tue, 25 Jan 2011 16:51:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=d1zd6reLY6Vjk9dOr5lQfvbOPZ+CdNc/EQzdXL/2Q8g=; b=jORn0Sa47OHgQ9insOIAttNWS4PCV5E+fyPJBjdAy9qk8zAmrsoxfyfhjZMtgJlKv5 J38JUjY1ck2dHanyXVAJ3tFI9rzTXEJxfX1+4+ug4FvGhk/pogoxQOIaPkd0lh34nM4M 91MYN0LLOqgfXsc4NiV/mofrkeMgFH1t9HLMU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=rVIfL0Xrp7JzU3ytpB8kX20RtKZhYKonfIKPFugo/Skbe7V8+7nvlwBa7miTUkhQhz /O4k4HZHLbrbv7zIBtWQUwyQft0H2vNsecu3m5656EbY/KrFXLoMmkBL1F+Z3+WncYgv 6P8MpWrW+GEmFP4+4mmYkoW16PXcFHzocWrn0= MIME-Version: 1.0 Received: by 10.229.79.135 with SMTP id p7mr5334151qck.154.1296003074663; Tue, 25 Jan 2011 16:51:14 -0800 (PST) Received: by 10.229.97.143 with HTTP; Tue, 25 Jan 2011 16:51:14 -0800 (PST) In-Reply-To: <3wd4o8wa7fx.fsf@testarossa.amd.com> References: <3wd4o8wa7fx.fsf@testarossa.amd.com> Date: Tue, 25 Jan 2011 19:51:14 -0500 Message-ID: Subject: Re: Strange match to my query From: Austin Clements To: Mark Anderson Content-Type: multipart/alternative; boundary=00163616494fe4a459049ab53a1c Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Jan 2011 00:51:17 -0000 --00163616494fe4a459049ab53a1c Content-Type: text/plain; charset=ISO-8859-1 Well-constructed test message. Xapian's query parser is actually doing the right thing [1] and this is a bug in the way notmuch indexes address list headers. For each address, _notmuch_message_gen_terms resets the term generator's term position, so your To header indexes with positions as c:1 hello:2 com:3 K:1 R:2 world:3 com:4 Thus, the phrase query "hello world" matches hello in position 2 and world in position 3. Probably the right thing for notmuch to do is to jump up the term generator position between each address so phrase queries don't cross them or span them. [1] Your to:\'$WORD1@$WORD2\' query didn't work because Xapian doesn't accept a single quote after a prefix. On Tue, Jan 25, 2011 at 6:29 PM, Mark Anderson wrote: > Hi guys, What's up? ("Notmuch") > > Apparently matching on email addresses doesn't work the way I hoped. > > While debugging why my to:x@y.com search was matching far > too many > entries, I whittled it down to this: > > WORD1=hello > WORD2=goodbye > MSGID=junk$(date +%s) > TESTDIR=$(notmuch config get database.path)/.tmp/new > TESTMAIL=$TESTDIR/$MSGID:2, > > mkdir -p $TESTDIR > > echo Testcase for $WORD1@$WORD2, msgid: $MSGID@junk.com > > echo "From: nobody@nobody.com > To: c@${WORD1}.com, K-R@${WORD2}.com > Date: Mon, 24 Jan 2011 23:41:34 -0600 > Subject: Error > Message-ID: <$MSGID@junk.com> > > Not empty body.= > > " > $TESTMAIL > > notmuch new > notmuch search --output=files to:$WORD1@$WORD2 > notmuch search --output=files to:\"$WORD1@$WORD2\" > > Why does that match, but this doesn't? > > notmuch search --output=files to:\'$WORD1@$WORD2\' > > Apparently single quotes are the only quote for Xapian's parser? > > I guess this is a strong vote for the quick integration of the custom > parser with optimization passes that turn emails into phrases that can't > match across multiple emails. > > This was just an egregious example of notmuch giving me notmuch of what > I wanted, or actually, far too much of what I didn't want. > > Thanks, > -Mark > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch > --00163616494fe4a459049ab53a1c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Well-constructed test message. =A0Xapian's query parser is actuall= y doing the right thing [1] and this is a bug in the way notmuch indexes ad= dress list headers. =A0For each address, _notmuch_message_gen_terms resets = the term generator's term position, so your To header indexes with posi= tions as
=A0=A0c:1 hello:2 com:3 K:1 R:2 world:3 com:4
Thus, the phra= se query "hello world" matches hello in position 2 and world in p= osition 3. =A0Probably the right thing for notmuch to do is to jump up the = term generator position between each address so phrase queries don't cr= oss them or span them.

[1] Your to:\'$WORD1@$WORD2\' query didn't = work because Xapian doesn't accept a single quote after a prefix.
=
On Tue, Jan 25, 2011 at 6:29 PM, Mark Anders= on <MarkR.An= derson@amd.com> wrote:
Hi guys, What's up? ("Notmuch"= ;)

Apparently matching on email addresses doesn't work the way I hoped.
While debugging why my to:x@y.com searc= h was matching far too many
entries, I whittled it down to this:

WORD1=3Dhello
WORD2=3Dgoodbye
MSGID=3Djunk$(date +%s)
TESTDIR=3D$(notmuch config get database.path)/.tmp/new
TESTMAIL=3D$TESTDIR/$MSGID:2,

mkdir -p $TESTDIR

echo Testcase for $WORD1@$WORD2, msgid: $= MSGID@junk.com

echo "From: nobody@nobody.com=
To: c@${WORD1}.com, K-R@${WORD2}.com
Date: Mon, 24 Jan 2011 23:41:34 -0600
Subject: Error
Message-ID: <$MSGID@junk.com>
Not empty body.=3D

" > $TESTMAIL

notmuch new
notmuch search --output=3Dfiles to:$WORD1@$WORD2
notmuch search --output=3Dfiles to:\"$WORD1@$WORD2\"

Why does that match, but this doesn't?

notmuch search --output=3Dfiles to:\'$WORD1@$WORD2\'

Apparently single quotes are the only quote for Xapian's parser?

I guess this is a strong vote for the quick integration of the custom
parser with optimization passes that turn emails into phrases that can'= t
match across multiple emails.

This was just an egregious example of notmuch giving me notmuch of what
I wanted, or actually, far too much of what I didn't want.

Thanks,
-Mark

_______________________________________________
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch

--00163616494fe4a459049ab53a1c--