From b96d42387e6a033e146e654cece898d8525e0bf8 Mon Sep 17 00:00:00 2001 From: Suvayu Ali Date: Sun, 19 Jul 2015 00:24:31 +0200 Subject: [PATCH] Re: Searching for phrases in the body of an email --- 21/47f3de0981d41ce34e5ae8d99839b4f9d901e6 | 157 ++++++++++++++++++++++ 1 file changed, 157 insertions(+) create mode 100644 21/47f3de0981d41ce34e5ae8d99839b4f9d901e6 diff --git a/21/47f3de0981d41ce34e5ae8d99839b4f9d901e6 b/21/47f3de0981d41ce34e5ae8d99839b4f9d901e6 new file mode 100644 index 000000000..871404d0a --- /dev/null +++ b/21/47f3de0981d41ce34e5ae8d99839b4f9d901e6 @@ -0,0 +1,157 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by arlo.cworth.org (Postfix) with ESMTP id 901606DE13A3 + for ; Sat, 18 Jul 2015 15:24:39 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at cworth.org +X-Spam-Flag: NO +X-Spam-Score: -0.719 +X-Spam-Level: +X-Spam-Status: No, score=-0.719 tagged_above=-999 required=5 tests=[AWL=0.101, + DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, + RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, + SPF_PASS=-0.001] autolearn=disabled +Received: from arlo.cworth.org ([127.0.0.1]) + by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id rMzlmcTQEI5h for ; + Sat, 18 Jul 2015 15:24:36 -0700 (PDT) +Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com + [209.85.212.178]) + by arlo.cworth.org (Postfix) with ESMTPS id 9CF8A6DE137E + for ; Sat, 18 Jul 2015 15:24:36 -0700 (PDT) +Received: by widic2 with SMTP id ic2so62195507wid.0 + for ; Sat, 18 Jul 2015 15:24:33 -0700 (PDT) +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; + h=sender:date:from:to:subject:message-id:mail-followup-to:references + :mime-version:content-type:content-disposition:in-reply-to + :user-agent; bh=4lcJOn0GJZWn7R+kXP9TQp335A0/qpueB2XvKwb2Zys=; + b=HQWQQefWg12gkFA8VOXaJItH/z1VfmhuaDR/6+R0KKE+iqPAqzCtxlNvndh5yapS7i + wROXPxzB+embQJuHHV9iIe1unbdE3HXl6drjnmhuJ2XdtSWL9E5PBe+weNgm1tSw+NVr + bdzPQYWzvtL7Qk7yLRPzKE3DO4GZEWtS6AdryaWx/IWzMPpppO7Esq47CpAhv5gWl1x3 + tYB5W8Peb4H5dxPCOaVnGcoC3ciJtuAB28/BgmOI8gSUUeipRiOOxqMb/PBf7vrDQ3/t + kkuka9VwlBHXqb29RFyhpqCopWuUeD15nlOI5P0MsZ9tqCFOFvgextX5PojQOoKbtEoj + aj7w== +X-Received: by 10.194.205.101 with SMTP id lf5mr45267223wjc.37.1437258273244; + Sat, 18 Jul 2015 15:24:33 -0700 (PDT) +Received: from chitra.no-ip.org (ip82-139-115-46.lijbrandt.net. + [82.139.115.46]) + by smtp.gmail.com with ESMTPSA id ez4sm4412184wid.14.2015.07.18.15.24.32 + for + (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); + Sat, 18 Jul 2015 15:24:32 -0700 (PDT) +Sender: Suvayu Ali +Date: Sun, 19 Jul 2015 00:24:31 +0200 +From: Suvayu Ali +To: notmuch@notmuchmail.org +Subject: Re: Searching for phrases in the body of an email +Message-ID: <20150718222431.GC4527@chitra.no-ip.org> +Mail-Followup-To: notmuch@notmuchmail.org +References: <20150717121111.GF25651@chitra.no-ip.org> + <55A923E9.5070509@imca-cat.org> + <20150718091139.GB8311@chitra.no-ip.org> + + <20150718153239.GB4527@chitra.no-ip.org> + +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +Content-Disposition: inline +In-Reply-To: + +User-Agent: Mutt/1.5.23.1 (2014-03-12) +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.18 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sat, 18 Jul 2015 22:24:39 -0000 + +Hi Jani, + +On Sat, Jul 18, 2015 at 06:53:53PM +0300, Jani Nikula wrote: +> On Jul 18, 2015 6:32 PM, "Suvayu Ali" wrote: +> > On Sat, Jul 18, 2015 at 10:54:30AM -0400, Xu Wang wrote: +> > > +> > > First note that I believe notmuch search is case insensitive by +> > > default, so your grep should be case insensitive as well. +> > +> > Good point, I tried that, didn't change the numbers much. The number of +> > matches from grep went up to 24, whereas notmuch count says 463. +> > +> > > More importantly, I'm not sure how 'no NEAR "plain text" ' syntax is +> > > parsed. Maybe it is parsed as {no NEAR plain} or {text}. +> > > +> > +> > Exactly, that's what I do not understand. +> > +> +> export NOTMUCH_DEBUG_QUERY=1 +> +> might help. + +That helped a lot! This is what I get: + + $ notmuch count -- no NEAR \"plain\ text\" + Query string is: + no NEAR "plain text" + Exclude query is: + Xapian::Query() + Final query is: + Xapian::Query((Tmail AND Zno:(pos=1) AND near:(pos=2) AND Zplain:(pos=3) AND text:(pos=4))) + 465 + $ notmuch count -- \"plain\ text\" + Query string is: + "plain text" + Exclude query is: + Xapian::Query() + Final query is: + Xapian::Query((Tmail AND (plain:(pos=1) PHRASE 2 text:(pos=2)))) + 870 + +I wanted the "plain text" to be treated as a phrase, as in the second +case. I have tried nesting the quotes. The closest I got to was this: + + $ notmuch count -- no NEAR 'plain\ text' + Query string is: + no NEAR plain\ text + Exclude query is: + Xapian::Query() + Final query is: + Xapian::Query((Tmail AND (no:(pos=1) NEAR 11 plain:(pos=2)) AND Ztext:(pos=3))) + 151 + +I then tried this: + + $ notmuch count -- no NEAR \(plain ADJ/1 text\) + Query string is: + no NEAR (plain ADJ/1 text) + Exclude query is: + Xapian::Query() + Final query is: + Xapian::Query((Tmail AND Zno:(pos=1) AND near:(pos=2) AND Zplain:(pos=3) AND (adj:(pos=4) PHRASE 2 1:(pos=5)) AND Ztext:(pos=6))) + 0 + +Again, this is not what I was expecting. With the last one, I was +expecting to group "plain" and "text" within a distance of 1, in the +given order, and then requring "no" to be near (within 10 words, the +default) the "plain ADJ/1 text" combination. + +Is my understanding of the query language completely wrong? Apart from +`man notmuch-search-terms', I looked here: +http://xapian.org/docs/queryparser.html + +Thanks for any help. + +Cheers, + +-- +Suvayu + +Open source is the future. It sets us free. -- 2.26.2