Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 11380431FD7 for ; Tue, 12 Nov 2013 11:31:42 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jWYG7YjnaAk7 for ; Tue, 12 Nov 2013 11:31:32 -0800 (PST) Received: from dmz-mailsec-scanner-1.mit.edu (dmz-mailsec-scanner-1.mit.edu [18.9.25.12]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id BFAFE431FD2 for ; Tue, 12 Nov 2013 11:31:31 -0800 (PST) X-AuditID: 1209190c-b7f7f6d000000bbd-d5-5282821241d3 Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) (using TLS with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-1.mit.edu (Symantec Messaging Gateway) with SMTP id 6A.27.03005.21282825; Tue, 12 Nov 2013 14:31:30 -0500 (EST) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id rACJVS2o009792; Tue, 12 Nov 2013 14:31:29 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id rACJVP2a000312 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Tue, 12 Nov 2013 14:31:27 -0500 Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80) (envelope-from ) id 1VgJgL-0005wV-Ap; Tue, 12 Nov 2013 14:31:25 -0500 From: Austin Clements To: Peter Zijlstra , notmuch@notmuchmail.org Subject: Re: [PATCH] notmuch: Add "maildir:" search option In-Reply-To: <20131112155637.GA16796@laptop.programming.kicks-ass.net> References: <20131112155637.GA16796@laptop.programming.kicks-ass.net> User-Agent: Notmuch/0.16+154~g96c0ce2 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu) Date: Tue, 12 Nov 2013 14:31:25 -0500 Message-ID: <87mwl94dte.fsf@awakening.csail.mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrPIsWRmVeSWpSXmKPExsUixG6nrivU1BRksPWmhcX1mzOZLY73HmBy YPLYvELL49mqW8wBTFFcNimpOZllqUX6dglcGXv7P7MXbLCo2DZnAVMD43vtLkZODgkBE4m2 l1dZIWwxiQv31rN1MXJxCAnMZpJoWdfPBOFsZJT48eovI4RzmkmiZ9kMJpAWIYEljBItF3xB bDYBfYkVayeBjRIRcJK4+2kqkM3BISxgJbF/VxJImFPATeLLsZPsEK2uEtuX3mcBsUUF4iWm LdzJDGKzCKhKHH3RCzaeF+i6vSdPs0HYghInZz4Bq2cW0JK48e8l0wRGgVlIUrOQpBYwMq1i lE3JrdLNTczMKU5N1i1OTszLSy3SNdTLzSzRS00p3cQICkdOSZ4djG8OKh1iFOBgVOLhfRDZ FCTEmlhWXJl7iFGSg0lJlFe8ACjEl5SfUpmRWJwRX1Sak1p8iFGCg1lJhNctBijHm5JYWZVa lA+TkuZgURLnvclhHyQkkJ5YkpqdmlqQWgSTleHgUJLgPVkE1ChYlJqeWpGWmVOCkGbi4AQZ zgM0/BlIDW9xQWJucWY6RP4Uo6KUOO8lkIQASCKjNA+uF5YuXjGKA70izHsApIoHmGrgul8B DWYCGrzkbiPI4JJEhJRUA2N3Rt2Df5/uKX8PdZYQ5pLzWn22aLJyTX7oD47p7x5vn+12OfRP bYFs17Ho3rVdzze/tBJ7zbpkXYBdKPePnsjvQV7mR4ruzN22XOiq3g6hd82d8ns3mUsU7PC7 ZZruGL1He8dUjS8OqT9/zDoh+bi6xPfSAl+pwp+a1uVmcu4ienE3+ZIXhiixFGckGmoxFxUn AgB957TZ8gIAAA== X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Nov 2013 19:31:42 -0000 I think this is a great idea. Personally I think this is how folder: should work. I find the semantics of folder: to be useless except where they happen to coincide with the boolean semantics used here. Unfortunately, changing folder: would require versioning the database, which we have only primordial support for right now. Various comments below, though nothing major. Of course, we'd also need some tests and man page updates for this. On Tue, 12 Nov 2013, Peter Zijlstra wrote: > Subject: notmuch: Add "maildir:" search option > > The current "folder:" search terms are near useless when you have > recursive folders, introduce a boolean "maildir:" search term to > exactly match the maildir folder. > > Given a Maildir++ layout like: > > ~/Maildir/ > ~/Maildir/cur > ~/Maildir/new > ~/Maildir/tmp > ~/Maildir/.Sent > ~/Maildir/.Sent/cur > ~/Maildir/.Sent/new > ~/Maildir/.Sent/tmp > ~/Maildir/.INBOX.LKML > ~/Maildir/.INBOX.LKML/cur > ~/Maildir/.INBOX.LKML/new > ~/Maildir/.INBOX.LKML/tmp > ~/Maildir/.INBOX.LKML.netdev > ~/Maildir/.INBOX.LKML.netdev/cur > ~/Maildir/.INBOX.LKML.netdev/new > ~/Maildir/.INBOX.LKML.netdev/tmp > ~/Maildir/.INBOX.LKML.arch > ~/Maildir/.INBOX.LKML.arch/cur > ~/Maildir/.INBOX.LKML.arch/new > ~/Maildir/.INBOX.LKML.arch/tmp > > This patch generates the following search index: > > $ delve -a Maildir/.notmuch/xapian/ | ~/s XMAILDIR > XMAILDIR:INBOX > XMAILDIR:INBOX/LKML > XMAILDIR:INBOX/LKML/arch > XMAILDIR:INBOX/LKML/netdev > XMAILDIR:Sent > > Which allows one (me!!1) to pose queries like: > > maildir:INBOX and not tag:list > > to more easily find offlist mail (from people like my family who don't > actually send their stuff over LKML :-). > > Signed-off-by: Peter Zijlstra > --- > > XXX: now I need to go figure out how to do searches like: > > subject:PATCH/0 > > which would mandate that PATCH is the first word occurring in the > subject. I think the position index holds enough information but I > need to look into that and obviously the query parser needs work for > this. This is a frequently requested feature. Unfortunately, there are two technical hurdles. One is that the position information actually *isn't* enough as it is both because the subject will start at some arbitrary term offset that depends on the from and to (and maybe other things) and because the Xapian Query structure doesn't expose a way to search for a specific term offset (only relative offsets). The other is that we currently use Xapian's query parser, which isn't extensible, though I took a stab at a custom query parser long ago and swear that one of these days I'll return to it. > > > lib/database.cc | 7 ++++--- > lib/message.cc | 40 ++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 44 insertions(+), 3 deletions(-) > > diff --git a/lib/database.cc b/lib/database.cc > index a021bf17253c..53aeaa68954d 100644 > --- a/lib/database.cc > +++ b/lib/database.cc > @@ -208,15 +208,16 @@ static prefix_t BOOLEAN_PREFIX_EXTERNAL[] = { > { "thread", "G" }, > { "tag", "K" }, > { "is", "K" }, > - { "id", "Q" } > + { "id", "Q" }, > + { "maildir", "XMAILDIR:" }, No colon. That is, the term prefix should just be "XMAILDIR". > }; > > static prefix_t PROBABILISTIC_PREFIX[]= { > { "from", "XFROM" }, > { "to", "XTO" }, > { "attachment", "XATTACHMENT" }, > - { "subject", "XSUBJECT"}, > - { "folder", "XFOLDER"} > + { "subject", "XSUBJECT" }, > + { "folder", "XFOLDER" }, Unintentional whitespace change? > }; > > const char * > diff --git a/lib/message.cc b/lib/message.cc > index 1b4637950f8e..45a727a6208f 100644 > --- a/lib/message.cc > +++ b/lib/message.cc > @@ -22,6 +22,7 @@ > #include "database-private.h" > > #include > +#include > > #include > > @@ -485,6 +486,8 @@ _notmuch_message_add_filename (notmuch_message_t *message, > notmuch_status_t status; > void *local = talloc_new (message); > char *direntry; > + char *maildir; > + int i; > > if (filename == NULL) > INTERNAL_ERROR ("Message filename cannot be NULL."); > @@ -507,6 +510,43 @@ _notmuch_message_add_filename (notmuch_message_t *message, > /* New terms allow user to search with folder: specification. */ > _notmuch_message_gen_terms (message, "folder", directory); > > + /* Convert the directory into a maildir path */ > + maildir = talloc_strdup(local, directory); Add a space before the "(". (Same thing throughout the rest of the patch.) > + > + /* Strip the maildir "cur", "new" directory entries. */ > + i = strlen(maildir); > + if (strncmp(maildir + i - 3, "cur", 3) == 0 || > + strncmp(maildir + i - 3, "new", 3) == 0) { This is unsafe if directory is less than three characters, which I believe could happen if the message is in the root mail directory (which shouldn't happen with a well-formed maildir, but notmuch doesn't require maildir, and, regardless, we should be defensive). Also, we have a STRNCMP_LITERAL macro that we often use for comparisons with string literals, but I'm good with this, too. > + maildir[i - 3] = '\0'; > + i -= 3; > + } > + > + /* Strip trailing '/' */ > + while (maildir[i-1] == '/') { This is also unsafe if maildir is the empty string. Also, add spaces around the "-" (likewise on the next line). > + maildir[i-1] = '\0'; > + i--; > + } > + > + /* Strip leading '/' */ > + while (maildir[0] == '/') > + maildir++; > + > + /* Strip leading '.' */ > + while (maildir[0] == '.') Why strip multiple dots? (I'm not saying you shouldn't, I'm just curious; and it may be worth explaining in the comment.) > + maildir++; > + > + /* Replace all remaining '.' with '/' */ I think this should only happen if there was a leading '.', indicating a Maildir++ folder hierarchy. A lot of people use the "file system" Maildir layout, which just consists of nested directories where the leaves are Maildirs (e.g., Dovecot's LAYOUT=fs, http://wiki2.dovecot.org/MailboxFormat/Maildir#Directory_Structure). In this case, it's perfectly legitimate to have '.'s in folder names and it would be confusing and unexpected to translate them to '/'s. This would also make this compatible with MH folders (which notmuch supports, though admittedly it's unclear if anyone actually uses them). > + for (i = 0; maildir[i]; i++) { > + if (maildir[i] == '.') > + maildir[i] = '/'; > + } > + > + /* If there's no string left, we're the "INBOX" */ > + if (maildir[0] == '\0') > + maildir = talloc_strdup(local, "INBOX"); > + > + _notmuch_message_add_term (message, "maildir", maildir); > + > talloc_free (local); > > return NOTMUCH_STATUS_SUCCESS; > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch