Re: Partial words on notmuch search?
authorAustin Clements <amdragon@MIT.EDU>
Tue, 17 Jan 2012 19:47:15 +0000 (14:47 +1900)
committerW. Trevor King <wking@tremily.us>
Fri, 7 Nov 2014 17:42:29 +0000 (09:42 -0800)
7a/84b043ed102de24fc144404b213b2d3e2fea0e [new file with mode: 0644]

diff --git a/7a/84b043ed102de24fc144404b213b2d3e2fea0e b/7a/84b043ed102de24fc144404b213b2d3e2fea0e
new file mode 100644 (file)
index 0000000..16908c1
--- /dev/null
@@ -0,0 +1,112 @@
+Return-Path: <amdragon@mit.edu>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+       by olra.theworths.org (Postfix) with ESMTP id F00D6421197\r
+       for <notmuch@notmuchmail.org>; Tue, 17 Jan 2012 11:47:40 -0800 (PST)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.7\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5\r
+       tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+       by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+       with ESMTP id grh8KH253H4k for <notmuch@notmuchmail.org>;\r
+       Tue, 17 Jan 2012 11:47:40 -0800 (PST)\r
+Received: from dmz-mailsec-scanner-5.mit.edu (DMZ-MAILSEC-SCANNER-5.MIT.EDU\r
+       [18.7.68.34])\r
+       by olra.theworths.org (Postfix) with ESMTP id 5B65A421192\r
+       for <notmuch@notmuchmail.org>; Tue, 17 Jan 2012 11:47:40 -0800 (PST)\r
+X-AuditID: 12074422-b7fd66d0000008f9-d8-4f15d05b5f1e\r
+Received: from mailhub-auth-1.mit.edu ( [18.9.21.35])\r
+       by dmz-mailsec-scanner-5.mit.edu (Symantec Messaging Gateway) with SMTP\r
+       id E9.55.02297.B50D51F4; Tue, 17 Jan 2012 14:47:39 -0500 (EST)\r
+Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103])\r
+       by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id q0HJlXHX018149; \r
+       Tue, 17 Jan 2012 14:47:33 -0500\r
+Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91])\r
+       (authenticated bits=0)\r
+       (User authenticated as amdragon@ATHENA.MIT.EDU)\r
+       by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id q0HJlRPg029944\r
+       (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT);\r
+       Tue, 17 Jan 2012 14:47:32 -0500 (EST)\r
+Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77)\r
+       (envelope-from <amdragon@MIT.EDU>)\r
+       id 1RnEzz-00082h-Kv; Tue, 17 Jan 2012 14:47:15 -0500\r
+Date: Tue, 17 Jan 2012 14:47:15 -0500\r
+From: Austin Clements <amdragon@MIT.EDU>\r
+To: Jani Nikula <jani@nikula.org>\r
+Subject: Re: Partial words on notmuch search?\r
+Message-ID: <20120117194715.GO16740@mit.edu>\r
+References: <20120115220600.GO7037@think.nuvreauspam>\r
+       <877h0sa207.fsf@fester.com>\r
+       <20120116202103.GA14329@think.nuvreauspam>\r
+       <20120117023431.GF16740@mit.edu> <87aa5mkyw5.fsf@nikula.org>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain; charset=us-ascii\r
+Content-Disposition: inline\r
+In-Reply-To: <87aa5mkyw5.fsf@nikula.org>\r
+User-Agent: Mutt/1.5.21 (2010-09-15)\r
+X-Brightmail-Tracker:\r
+ H4sIAAAAAAAAA+NgFuplleLIzCtJLcpLzFFi42IR4hRV1o2+IOpv8HG+kMWqCdIWTdOdLa7f\r
+       nMnswOyxc9Zddo9b91+zezxbdYs5gDmKyyYlNSezLLVI3y6BK+N8xx/2gq/cFc+eLWJrYDzP\r
+       2cXIySEhYCLR/H09K4QtJnHh3nq2LkYuDiGBfYwSxx+dYYdwNjBKrD53kBHCOckkcfbSYiYI\r
+       ZwmjxI3fU9lA+lkEVCXad99nBLHZBDQktu1fDmaLCChKbD65H8xmFnCTWLy5F2yfsICuxKkd\r
+       B9hBbF4BHYlna14xQww9yCix7+ADVoiEoMTJmU9YIJq1JG78ewm0mQPIlpZY/o8DJMwJtKtn\r
+       4gMmEFtUQEViysltbBMYhWYh6Z6FpHsWQvcCRuZVjLIpuVW6uYmZOcWpybrFyYl5ealFuqZ6\r
+       uZkleqkppZsYQaHO7qK0g/HnQaVDjAIcjEo8vAWbRP2FWBPLiitzDzFKcjApifI+PA8U4kvK\r
+       T6nMSCzOiC8qzUktPsQowcGsJMKbmwaU401JrKxKLcqHSUlzsCiJ86prvfMTEkhPLEnNTk0t\r
+       SC2CycpwcChJ8C4FGSpYlJqeWpGWmVOCkGbi4AQZzgM0fCFIDW9xQWJucWY6RP4Uo6KUOG8T\r
+       SEIAJJFRmgfXC0tFrxjFgV4R5l0OUsUDTGNw3a+ABjMBDc5pFQIZXJKIkJJqYJzXFGlYcCl5\r
+       y/efwaIn9FY7VC72XfDOXiCWeYrLNs/Jtd/XTAhiCdXtK7apf/Fao9uhKO4N7xqBFccPuOfz\r
+       331ziD1X7kvybLtzbJVvrvhad854t/nR5rb4mMAf1Uz5n10ZzLpmz/i/mqP1bM+Lv6qrXONf\r
+       /jVldVvRlGKp6sXvWfVLccuVN0osxRmJhlrMRcWJAFIqgc0gAwAA\r
+Cc: notmuch@notmuchmail.org, Andrei Popescu <andreimpopescu@gmail.com>\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+       <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+       <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+       <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Tue, 17 Jan 2012 19:47:41 -0000\r
+\r
+Quoth Jani Nikula on Jan 17 at  7:43 pm:\r
+> On Mon, 16 Jan 2012 21:34:31 -0500, Austin Clements <amdragon@MIT.EDU> wrote:\r
+> > Quoth Andrei Popescu on Jan 16 at 10:21 pm:\r
+> > > This is also interesting:\r
+> > > $ notmuch count 'debian'\r
+> > > 65888\r
+> > > $ notmuch count 'dEbian'\r
+> > > 65888\r
+> > > $ notmuch count 'Debian'\r
+> > > 65887\r
+> > \r
+> > The first two will match stemmed versions of "debian" such as\r
+> > "debian's" and "debianed".  However, starting a term with a capital\r
+> > letter suppresses stemming (because it suggests that it's a name,\r
+> > which you wouldn't want to modify), so your last query matches only\r
+> > the term "debian".  This is probably documented somewhere, though I\r
+> > don't know where.\r
+> \r
+> Interesting. Is this done when adding the terms to the database, or when\r
+> searching? I presume the latter. How much control does notmuch have over\r
+> this?\r
+\r
+This is getting a bit out of my depth, but I believe indexing is done\r
+with both stemmed and unstemmed versions of all terms (if stemming is\r
+enabled) so that search can use either.\r
+\r
+For indexing, Notmuch can set the stemmer (or no stemmer).  Xapian\r
+provides stemmers for a variety of languages:\r
+  http://xapian.org/docs/apidoc/html/classXapian_1_1Stem.html#6c46cedf2047b159a7e4c9d4468242b1\r
+\r
+For query parsing, Notmuch can set both the stemmer and a "stemming\r
+strategy" that controls when it stems or doesn't stem terms:\r
+  http://xapian.org/docs/apidoc/html/classXapian_1_1QueryParser.html#c7dc3b55b6083bd3ff98fc8b2726c8fd\r