From: Peter Zijlstra <peterz@infradead.org>
Date: Tue, 12 Nov 2013 19:47:27 +0000 (+0100)
Subject: Re: [PATCH] notmuch: Add "maildir:" search option
X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=c955bce5b318e30312068789e57991c07d99562f;p=notmuch-archives.git

Re: [PATCH] notmuch: Add "maildir:" search option
---

diff --git a/50/dcd63ad0074904bcf00ba8bc98822b78299276 b/50/dcd63ad0074904bcf00ba8bc98822b78299276
new file mode 100644
index 000000000..c995bcf40
--- /dev/null
+++ b/50/dcd63ad0074904bcf00ba8bc98822b78299276
@@ -0,0 +1,198 @@
+Return-Path: <peterz@infradead.org>
+X-Original-To: notmuch@notmuchmail.org
+Delivered-To: notmuch@notmuchmail.org
+Received: from localhost (localhost [127.0.0.1])
+	by olra.theworths.org (Postfix) with ESMTP id 3467A431FDA
+	for <notmuch@notmuchmail.org>; Tue, 12 Nov 2013 11:47:39 -0800 (PST)
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
+X-Spam-Flag: NO
+X-Spam-Score: -2.3
+X-Spam-Level: 
+X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5
+	tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled
+Received: from olra.theworths.org ([127.0.0.1])
+	by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
+	with ESMTP id zK3Owp4lsx65 for <notmuch@notmuchmail.org>;
+	Tue, 12 Nov 2013 11:47:31 -0800 (PST)
+Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134])
+	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
+	(No client certificate requested)
+	by olra.theworths.org (Postfix) with ESMTPS id 2E633431FD2
+	for <notmuch@notmuchmail.org>; Tue, 12 Nov 2013 11:47:31 -0800 (PST)
+Received: from dhcp-077-248-225-117.chello.nl ([77.248.225.117] helo=laptop)
+	by merlin.infradead.org with esmtpsa (Exim 4.80.1 #2 (Red Hat Linux))
+	id 1VgJvt-0000kX-RB; Tue, 12 Nov 2013 19:47:30 +0000
+Received: by laptop (Postfix, from userid 1000)
+	id 96DD8103BA2B9; Tue, 12 Nov 2013 20:47:27 +0100 (CET)
+Date: Tue, 12 Nov 2013 20:47:27 +0100
+From: Peter Zijlstra <peterz@infradead.org>
+To: Austin Clements <aclements@csail.mit.edu>
+Subject: Re: [PATCH] notmuch: Add "maildir:" search option
+Message-ID: <20131112194727.GE16796@laptop.programming.kicks-ass.net>
+References: <20131112155637.GA16796@laptop.programming.kicks-ass.net>
+	<87mwl94dte.fsf@awakening.csail.mit.edu>
+MIME-Version: 1.0
+Content-Type: text/plain; charset=us-ascii
+Content-Disposition: inline
+In-Reply-To: <87mwl94dte.fsf@awakening.csail.mit.edu>
+User-Agent: Mutt/1.5.21 (2012-12-30)
+X-Mailman-Approved-At: Tue, 12 Nov 2013 12:40:21 -0800
+Cc: notmuch@notmuchmail.org
+X-BeenThere: notmuch@notmuchmail.org
+X-Mailman-Version: 2.1.13
+Precedence: list
+List-Id: "Use and development of the notmuch mail system."
+	<notmuch.notmuchmail.org>
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
+	<mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>
+List-Post: <mailto:notmuch@notmuchmail.org>
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
+	<mailto:notmuch-request@notmuchmail.org?subject=subscribe>
+X-List-Received-Date: Tue, 12 Nov 2013 19:47:39 -0000
+
+On Tue, Nov 12, 2013 at 02:31:25PM -0500, Austin Clements wrote:
+> > XXX: now I need to go figure out how to do searches like:
+> >
+> >   subject:PATCH/0
+> >
+> > which would mandate that PATCH is the first word occurring in the
+> > subject. I think the position index holds enough information but I
+> > need to look into that and obviously the query parser needs work for
+> > this.
+> 
+> This is a frequently requested feature.  Unfortunately, there are two
+> technical hurdles.  One is that the position information actually
+> *isn't* enough as it is both because the subject will start at some
+> arbitrary term offset that depends on the from and to (and maybe other
+> things) and because the Xapian Query structure doesn't expose a way to
+> search for a specific term offset (only relative offsets).  The other is
+> that we currently use Xapian's query parser, which isn't extensible,
+> though I took a stab at a custom query parser long ago and swear that
+> one of these days I'll return to it.
+
+Bah, I knew that would end up being more complex :-)
+
+> >  lib/database.cc |  7 ++++---
+> >  lib/message.cc  | 40 ++++++++++++++++++++++++++++++++++++++++
+> >  2 files changed, 44 insertions(+), 3 deletions(-)
+> >
+> > diff --git a/lib/database.cc b/lib/database.cc
+> > index a021bf17253c..53aeaa68954d 100644
+> > --- a/lib/database.cc
+> > +++ b/lib/database.cc
+> > @@ -208,15 +208,16 @@ static prefix_t BOOLEAN_PREFIX_EXTERNAL[] = {
+> >      { "thread",			"G" },
+> >      { "tag",			"K" },
+> >      { "is",			"K" },
+> > -    { "id",			"Q" }
+> > +    { "id",			"Q" },
+> > +    { "maildir",		"XMAILDIR:" },
+> 
+> No colon.  That is, the term prefix should just be "XMAILDIR".
+
+No, that colon is crucial; see http://xapian.org/docs/omega/termprefixes.html
+
+"X starts a multi-capital letter user-defined prefix. If you want a
+prefix for something without a standard prefix, you create your own
+starting with an X (e.g. XSHOESIZE). The prefix ends with the first
+non-capital. If the term you're prefixing starts with a capital, add a
+":" between prefix and term to resolve ambiguity about where the prefix
+ends and the term begins."
+
+Since maildir folder names typically start with a capital we need that
+':' in between the prefix and value. I tried, my initial versions didn't
+have the ':' and reliably didn't work.
+
+> >  };
+> >  
+> >  static prefix_t PROBABILISTIC_PREFIX[]= {
+> >      { "from",			"XFROM" },
+> >      { "to",			"XTO" },
+> >      { "attachment",		"XATTACHMENT" },
+> > -    { "subject",		"XSUBJECT"},
+> > -    { "folder",			"XFOLDER"}
+> > +    { "subject",		"XSUBJECT" },
+> > +    { "folder",			"XFOLDER" },
+> 
+> Unintentional whitespace change?
+
+Oops yes.
+
+> > +    /* Convert the directory into a maildir path */
+> > +    maildir = talloc_strdup(local, directory);
+> 
+> Add a space before the "(".  (Same thing throughout the rest of the
+> patch.)
+
+Urgh, weird coding style but yes you're right, I should've kept it
+consistent with the rest of the file. Will fix.
+
+> > +
+> > +    /* Strip the maildir "cur", "new" directory entries. */
+> > +    i = strlen(maildir);
+> > +    if (strncmp(maildir + i - 3, "cur", 3) == 0 ||
+> > +	strncmp(maildir + i - 3, "new", 3) == 0) {
+> 
+> This is unsafe if directory is less than three characters, which I
+> believe could happen if the message is in the root mail directory (which
+> shouldn't happen with a well-formed maildir, but notmuch doesn't require
+> maildir, and, regardless, we should be defensive).
+> 
+> Also, we have a STRNCMP_LITERAL macro that we often use for comparisons
+> with string literals, but I'm good with this, too.
+
+Quite so, I haven't actually seen that, but you're quite right.
+
+> > +	    maildir[i - 3] = '\0';
+> > +	    i -= 3;
+> > +    }
+> > +
+> > +    /* Strip trailing '/' */
+> > +    while (maildir[i-1] == '/') {
+> 
+> This is also unsafe if maildir is the empty string.
+> 
+> Also, add spaces around the "-" (likewise on the next line).
+
+Will fix.
+
+> > +	    maildir[i-1] = '\0';
+> > +	    i--;
+> > +    }
+> > +
+> > +    /* Strip leading '/' */
+> > +    while (maildir[0] == '/')
+> > +	    maildir++;
+> > +
+> > +    /* Strip leading '.' */
+> > +    while (maildir[0] == '.')
+> 
+> Why strip multiple dots?  (I'm not saying you shouldn't, I'm just
+> curious; and it may be worth explaining in the comment.)
+
+No reason, copy paste damage from above mostly.
+
+> > +	    maildir++;
+> > +
+> > +    /* Replace all remaining '.' with '/' */
+> 
+> I think this should only happen if there was a leading '.', indicating a
+> Maildir++ folder hierarchy.  A lot of people use the "file system"
+> Maildir layout, which just consists of nested directories where the
+> leaves are Maildirs (e.g., Dovecot's LAYOUT=fs,
+> http://wiki2.dovecot.org/MailboxFormat/Maildir#Directory_Structure).  In
+> this case, it's perfectly legitimate to have '.'s in folder names and it
+> would be confusing and unexpected to translate them to '/'s.  This would
+> also make this compatible with MH folders (which notmuch supports,
+> though admittedly it's unclear if anyone actually uses them).
+
+Ah, I wasn't aware Dovecot actually supported this Maildir variant --
+although I've recently come across this variant someplace and thought it
+was odd.
+
+OK, I can make it conditional on the initial leading dot.
+
+I'll respin the patch and try to come up with manpage and test additions
+to cover this new functionality.