From c955bce5b318e30312068789e57991c07d99562f Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 12 Nov 2013 20:47:27 +0100 Subject: [PATCH] Re: [PATCH] notmuch: Add "maildir:" search option --- 50/dcd63ad0074904bcf00ba8bc98822b78299276 | 198 ++++++++++++++++++++++ 1 file changed, 198 insertions(+) create mode 100644 50/dcd63ad0074904bcf00ba8bc98822b78299276 diff --git a/50/dcd63ad0074904bcf00ba8bc98822b78299276 b/50/dcd63ad0074904bcf00ba8bc98822b78299276 new file mode 100644 index 000000000..c995bcf40 --- /dev/null +++ b/50/dcd63ad0074904bcf00ba8bc98822b78299276 @@ -0,0 +1,198 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 3467A431FDA + for ; Tue, 12 Nov 2013 11:47:39 -0800 (PST) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -2.3 +X-Spam-Level: +X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id zK3Owp4lsx65 for ; + Tue, 12 Nov 2013 11:47:31 -0800 (PST) +Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) + (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id 2E633431FD2 + for ; Tue, 12 Nov 2013 11:47:31 -0800 (PST) +Received: from dhcp-077-248-225-117.chello.nl ([77.248.225.117] helo=laptop) + by merlin.infradead.org with esmtpsa (Exim 4.80.1 #2 (Red Hat Linux)) + id 1VgJvt-0000kX-RB; Tue, 12 Nov 2013 19:47:30 +0000 +Received: by laptop (Postfix, from userid 1000) + id 96DD8103BA2B9; Tue, 12 Nov 2013 20:47:27 +0100 (CET) +Date: Tue, 12 Nov 2013 20:47:27 +0100 +From: Peter Zijlstra +To: Austin Clements +Subject: Re: [PATCH] notmuch: Add "maildir:" search option +Message-ID: <20131112194727.GE16796@laptop.programming.kicks-ass.net> +References: <20131112155637.GA16796@laptop.programming.kicks-ass.net> + <87mwl94dte.fsf@awakening.csail.mit.edu> +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +Content-Disposition: inline +In-Reply-To: <87mwl94dte.fsf@awakening.csail.mit.edu> +User-Agent: Mutt/1.5.21 (2012-12-30) +X-Mailman-Approved-At: Tue, 12 Nov 2013 12:40:21 -0800 +Cc: notmuch@notmuchmail.org +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Tue, 12 Nov 2013 19:47:39 -0000 + +On Tue, Nov 12, 2013 at 02:31:25PM -0500, Austin Clements wrote: +> > XXX: now I need to go figure out how to do searches like: +> > +> > subject:PATCH/0 +> > +> > which would mandate that PATCH is the first word occurring in the +> > subject. I think the position index holds enough information but I +> > need to look into that and obviously the query parser needs work for +> > this. +> +> This is a frequently requested feature. Unfortunately, there are two +> technical hurdles. One is that the position information actually +> *isn't* enough as it is both because the subject will start at some +> arbitrary term offset that depends on the from and to (and maybe other +> things) and because the Xapian Query structure doesn't expose a way to +> search for a specific term offset (only relative offsets). The other is +> that we currently use Xapian's query parser, which isn't extensible, +> though I took a stab at a custom query parser long ago and swear that +> one of these days I'll return to it. + +Bah, I knew that would end up being more complex :-) + +> > lib/database.cc | 7 ++++--- +> > lib/message.cc | 40 ++++++++++++++++++++++++++++++++++++++++ +> > 2 files changed, 44 insertions(+), 3 deletions(-) +> > +> > diff --git a/lib/database.cc b/lib/database.cc +> > index a021bf17253c..53aeaa68954d 100644 +> > --- a/lib/database.cc +> > +++ b/lib/database.cc +> > @@ -208,15 +208,16 @@ static prefix_t BOOLEAN_PREFIX_EXTERNAL[] = { +> > { "thread", "G" }, +> > { "tag", "K" }, +> > { "is", "K" }, +> > - { "id", "Q" } +> > + { "id", "Q" }, +> > + { "maildir", "XMAILDIR:" }, +> +> No colon. That is, the term prefix should just be "XMAILDIR". + +No, that colon is crucial; see http://xapian.org/docs/omega/termprefixes.html + +"X starts a multi-capital letter user-defined prefix. If you want a +prefix for something without a standard prefix, you create your own +starting with an X (e.g. XSHOESIZE). The prefix ends with the first +non-capital. If the term you're prefixing starts with a capital, add a +":" between prefix and term to resolve ambiguity about where the prefix +ends and the term begins." + +Since maildir folder names typically start with a capital we need that +':' in between the prefix and value. I tried, my initial versions didn't +have the ':' and reliably didn't work. + +> > }; +> > +> > static prefix_t PROBABILISTIC_PREFIX[]= { +> > { "from", "XFROM" }, +> > { "to", "XTO" }, +> > { "attachment", "XATTACHMENT" }, +> > - { "subject", "XSUBJECT"}, +> > - { "folder", "XFOLDER"} +> > + { "subject", "XSUBJECT" }, +> > + { "folder", "XFOLDER" }, +> +> Unintentional whitespace change? + +Oops yes. + +> > + /* Convert the directory into a maildir path */ +> > + maildir = talloc_strdup(local, directory); +> +> Add a space before the "(". (Same thing throughout the rest of the +> patch.) + +Urgh, weird coding style but yes you're right, I should've kept it +consistent with the rest of the file. Will fix. + +> > + +> > + /* Strip the maildir "cur", "new" directory entries. */ +> > + i = strlen(maildir); +> > + if (strncmp(maildir + i - 3, "cur", 3) == 0 || +> > + strncmp(maildir + i - 3, "new", 3) == 0) { +> +> This is unsafe if directory is less than three characters, which I +> believe could happen if the message is in the root mail directory (which +> shouldn't happen with a well-formed maildir, but notmuch doesn't require +> maildir, and, regardless, we should be defensive). +> +> Also, we have a STRNCMP_LITERAL macro that we often use for comparisons +> with string literals, but I'm good with this, too. + +Quite so, I haven't actually seen that, but you're quite right. + +> > + maildir[i - 3] = '\0'; +> > + i -= 3; +> > + } +> > + +> > + /* Strip trailing '/' */ +> > + while (maildir[i-1] == '/') { +> +> This is also unsafe if maildir is the empty string. +> +> Also, add spaces around the "-" (likewise on the next line). + +Will fix. + +> > + maildir[i-1] = '\0'; +> > + i--; +> > + } +> > + +> > + /* Strip leading '/' */ +> > + while (maildir[0] == '/') +> > + maildir++; +> > + +> > + /* Strip leading '.' */ +> > + while (maildir[0] == '.') +> +> Why strip multiple dots? (I'm not saying you shouldn't, I'm just +> curious; and it may be worth explaining in the comment.) + +No reason, copy paste damage from above mostly. + +> > + maildir++; +> > + +> > + /* Replace all remaining '.' with '/' */ +> +> I think this should only happen if there was a leading '.', indicating a +> Maildir++ folder hierarchy. A lot of people use the "file system" +> Maildir layout, which just consists of nested directories where the +> leaves are Maildirs (e.g., Dovecot's LAYOUT=fs, +> http://wiki2.dovecot.org/MailboxFormat/Maildir#Directory_Structure). In +> this case, it's perfectly legitimate to have '.'s in folder names and it +> would be confusing and unexpected to translate them to '/'s. This would +> also make this compatible with MH folders (which notmuch supports, +> though admittedly it's unclear if anyone actually uses them). + +Ah, I wasn't aware Dovecot actually supported this Maildir variant -- +although I've recently come across this variant someplace and thought it +was odd. + +OK, I can make it conditional on the initial leading dot. + +I'll respin the patch and try to come up with manpage and test additions +to cover this new functionality. -- 2.26.2