From: Austin Clements Date: Tue, 12 Nov 2013 19:31:25 +0000 (+1900) Subject: Re: [PATCH] notmuch: Add "maildir:" search option X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=56ca2ce0f892c110f0014e3f0732ba7182c721c8;p=notmuch-archives.git Re: [PATCH] notmuch: Add "maildir:" search option --- diff --git a/f1/ede4bafcaf7b27284cfcf8bb3f5e8de6aed2cd b/f1/ede4bafcaf7b27284cfcf8bb3f5e8de6aed2cd new file mode 100644 index 000000000..ba06c9861 --- /dev/null +++ b/f1/ede4bafcaf7b27284cfcf8bb3f5e8de6aed2cd @@ -0,0 +1,294 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 11380431FD7 + for ; Tue, 12 Nov 2013 11:31:42 -0800 (PST) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -0.7 +X-Spam-Level: +X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id jWYG7YjnaAk7 for ; + Tue, 12 Nov 2013 11:31:32 -0800 (PST) +Received: from dmz-mailsec-scanner-1.mit.edu (dmz-mailsec-scanner-1.mit.edu + [18.9.25.12]) + (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id BFAFE431FD2 + for ; Tue, 12 Nov 2013 11:31:31 -0800 (PST) +X-AuditID: 1209190c-b7f7f6d000000bbd-d5-5282821241d3 +Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) + (using TLS with cipher AES256-SHA (256/256 bits)) + (Client did not present a certificate) + by dmz-mailsec-scanner-1.mit.edu (Symantec Messaging Gateway) with SMTP + id 6A.27.03005.21282825; Tue, 12 Nov 2013 14:31:30 -0500 (EST) +Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) + by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id rACJVS2o009792; + Tue, 12 Nov 2013 14:31:29 -0500 +Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) + (authenticated bits=0) + (User authenticated as amdragon@ATHENA.MIT.EDU) + by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id rACJVP2a000312 + (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); + Tue, 12 Nov 2013 14:31:27 -0500 +Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80) + (envelope-from ) + id 1VgJgL-0005wV-Ap; Tue, 12 Nov 2013 14:31:25 -0500 +From: Austin Clements +To: Peter Zijlstra , notmuch@notmuchmail.org +Subject: Re: [PATCH] notmuch: Add "maildir:" search option +In-Reply-To: <20131112155637.GA16796@laptop.programming.kicks-ass.net> +References: <20131112155637.GA16796@laptop.programming.kicks-ass.net> +User-Agent: Notmuch/0.16+154~g96c0ce2 (http://notmuchmail.org) Emacs/23.4.1 + (i486-pc-linux-gnu) +Date: Tue, 12 Nov 2013 14:31:25 -0500 +Message-ID: <87mwl94dte.fsf@awakening.csail.mit.edu> +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +X-Brightmail-Tracker: + H4sIAAAAAAAAA+NgFrrPIsWRmVeSWpSXmKPExsUixG6nrivU1BRksPWmhcX1mzOZLY73HmBy + YPLYvELL49mqW8wBTFFcNimpOZllqUX6dglcGXv7P7MXbLCo2DZnAVMD43vtLkZODgkBE4m2 + l1dZIWwxiQv31rN1MXJxCAnMZpJoWdfPBOFsZJT48eovI4RzmkmiZ9kMJpAWIYEljBItF3xB + bDYBfYkVayeBjRIRcJK4+2kqkM3BISxgJbF/VxJImFPATeLLsZPsEK2uEtuX3mcBsUUF4iWm + LdzJDGKzCKhKHH3RCzaeF+i6vSdPs0HYghInZz4Bq2cW0JK48e8l0wRGgVlIUrOQpBYwMq1i + lE3JrdLNTczMKU5N1i1OTszLSy3SNdTLzSzRS00p3cQICkdOSZ4djG8OKh1iFOBgVOLhfRDZ + FCTEmlhWXJl7iFGSg0lJlFe8ACjEl5SfUpmRWJwRX1Sak1p8iFGCg1lJhNctBijHm5JYWZVa + lA+TkuZgURLnvclhHyQkkJ5YkpqdmlqQWgSTleHgUJLgPVkE1ChYlJqeWpGWmVOCkGbi4AQZ + zgM0/BlIDW9xQWJucWY6RP4Uo6KUOO8lkIQASCKjNA+uF5YuXjGKA70izHsApIoHmGrgul8B + DWYCGrzkbiPI4JJEhJRUA2N3Rt2Df5/uKX8PdZYQ5pLzWn22aLJyTX7oD47p7x5vn+12OfRP + bYFs17Ho3rVdzze/tBJ7zbpkXYBdKPePnsjvQV7mR4ruzN22XOiq3g6hd82d8ns3mUsU7PC7 + ZZruGL1He8dUjS8OqT9/zDoh+bi6xPfSAl+pwp+a1uVmcu4ienE3+ZIXhiixFGckGmoxFxUn + AgB957TZ8gIAAA== +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Tue, 12 Nov 2013 19:31:42 -0000 + +I think this is a great idea. Personally I think this is how folder: +should work. I find the semantics of folder: to be useless except where +they happen to coincide with the boolean semantics used here. +Unfortunately, changing folder: would require versioning the database, +which we have only primordial support for right now. + +Various comments below, though nothing major. Of course, we'd also need +some tests and man page updates for this. + +On Tue, 12 Nov 2013, Peter Zijlstra wrote: +> Subject: notmuch: Add "maildir:" search option +> +> The current "folder:" search terms are near useless when you have +> recursive folders, introduce a boolean "maildir:" search term to +> exactly match the maildir folder. +> +> Given a Maildir++ layout like: +> +> ~/Maildir/ +> ~/Maildir/cur +> ~/Maildir/new +> ~/Maildir/tmp +> ~/Maildir/.Sent +> ~/Maildir/.Sent/cur +> ~/Maildir/.Sent/new +> ~/Maildir/.Sent/tmp +> ~/Maildir/.INBOX.LKML +> ~/Maildir/.INBOX.LKML/cur +> ~/Maildir/.INBOX.LKML/new +> ~/Maildir/.INBOX.LKML/tmp +> ~/Maildir/.INBOX.LKML.netdev +> ~/Maildir/.INBOX.LKML.netdev/cur +> ~/Maildir/.INBOX.LKML.netdev/new +> ~/Maildir/.INBOX.LKML.netdev/tmp +> ~/Maildir/.INBOX.LKML.arch +> ~/Maildir/.INBOX.LKML.arch/cur +> ~/Maildir/.INBOX.LKML.arch/new +> ~/Maildir/.INBOX.LKML.arch/tmp +> +> This patch generates the following search index: +> +> $ delve -a Maildir/.notmuch/xapian/ | ~/s XMAILDIR +> XMAILDIR:INBOX +> XMAILDIR:INBOX/LKML +> XMAILDIR:INBOX/LKML/arch +> XMAILDIR:INBOX/LKML/netdev +> XMAILDIR:Sent +> +> Which allows one (me!!1) to pose queries like: +> +> maildir:INBOX and not tag:list +> +> to more easily find offlist mail (from people like my family who don't +> actually send their stuff over LKML :-). +> +> Signed-off-by: Peter Zijlstra +> --- +> +> XXX: now I need to go figure out how to do searches like: +> +> subject:PATCH/0 +> +> which would mandate that PATCH is the first word occurring in the +> subject. I think the position index holds enough information but I +> need to look into that and obviously the query parser needs work for +> this. + +This is a frequently requested feature. Unfortunately, there are two +technical hurdles. One is that the position information actually +*isn't* enough as it is both because the subject will start at some +arbitrary term offset that depends on the from and to (and maybe other +things) and because the Xapian Query structure doesn't expose a way to +search for a specific term offset (only relative offsets). The other is +that we currently use Xapian's query parser, which isn't extensible, +though I took a stab at a custom query parser long ago and swear that +one of these days I'll return to it. + +> +> +> lib/database.cc | 7 ++++--- +> lib/message.cc | 40 ++++++++++++++++++++++++++++++++++++++++ +> 2 files changed, 44 insertions(+), 3 deletions(-) +> +> diff --git a/lib/database.cc b/lib/database.cc +> index a021bf17253c..53aeaa68954d 100644 +> --- a/lib/database.cc +> +++ b/lib/database.cc +> @@ -208,15 +208,16 @@ static prefix_t BOOLEAN_PREFIX_EXTERNAL[] = { +> { "thread", "G" }, +> { "tag", "K" }, +> { "is", "K" }, +> - { "id", "Q" } +> + { "id", "Q" }, +> + { "maildir", "XMAILDIR:" }, + +No colon. That is, the term prefix should just be "XMAILDIR". + +> }; +> +> static prefix_t PROBABILISTIC_PREFIX[]= { +> { "from", "XFROM" }, +> { "to", "XTO" }, +> { "attachment", "XATTACHMENT" }, +> - { "subject", "XSUBJECT"}, +> - { "folder", "XFOLDER"} +> + { "subject", "XSUBJECT" }, +> + { "folder", "XFOLDER" }, + +Unintentional whitespace change? + +> }; +> +> const char * +> diff --git a/lib/message.cc b/lib/message.cc +> index 1b4637950f8e..45a727a6208f 100644 +> --- a/lib/message.cc +> +++ b/lib/message.cc +> @@ -22,6 +22,7 @@ +> #include "database-private.h" +> +> #include +> +#include +> +> #include +> +> @@ -485,6 +486,8 @@ _notmuch_message_add_filename (notmuch_message_t *message, +> notmuch_status_t status; +> void *local = talloc_new (message); +> char *direntry; +> + char *maildir; +> + int i; +> +> if (filename == NULL) +> INTERNAL_ERROR ("Message filename cannot be NULL."); +> @@ -507,6 +510,43 @@ _notmuch_message_add_filename (notmuch_message_t *message, +> /* New terms allow user to search with folder: specification. */ +> _notmuch_message_gen_terms (message, "folder", directory); +> +> + /* Convert the directory into a maildir path */ +> + maildir = talloc_strdup(local, directory); + +Add a space before the "(". (Same thing throughout the rest of the +patch.) + +> + +> + /* Strip the maildir "cur", "new" directory entries. */ +> + i = strlen(maildir); +> + if (strncmp(maildir + i - 3, "cur", 3) == 0 || +> + strncmp(maildir + i - 3, "new", 3) == 0) { + +This is unsafe if directory is less than three characters, which I +believe could happen if the message is in the root mail directory (which +shouldn't happen with a well-formed maildir, but notmuch doesn't require +maildir, and, regardless, we should be defensive). + +Also, we have a STRNCMP_LITERAL macro that we often use for comparisons +with string literals, but I'm good with this, too. + +> + maildir[i - 3] = '\0'; +> + i -= 3; +> + } +> + +> + /* Strip trailing '/' */ +> + while (maildir[i-1] == '/') { + +This is also unsafe if maildir is the empty string. + +Also, add spaces around the "-" (likewise on the next line). + +> + maildir[i-1] = '\0'; +> + i--; +> + } +> + +> + /* Strip leading '/' */ +> + while (maildir[0] == '/') +> + maildir++; +> + +> + /* Strip leading '.' */ +> + while (maildir[0] == '.') + +Why strip multiple dots? (I'm not saying you shouldn't, I'm just +curious; and it may be worth explaining in the comment.) + +> + maildir++; +> + +> + /* Replace all remaining '.' with '/' */ + +I think this should only happen if there was a leading '.', indicating a +Maildir++ folder hierarchy. A lot of people use the "file system" +Maildir layout, which just consists of nested directories where the +leaves are Maildirs (e.g., Dovecot's LAYOUT=fs, +http://wiki2.dovecot.org/MailboxFormat/Maildir#Directory_Structure). In +this case, it's perfectly legitimate to have '.'s in folder names and it +would be confusing and unexpected to translate them to '/'s. This would +also make this compatible with MH folders (which notmuch supports, +though admittedly it's unclear if anyone actually uses them). + +> + for (i = 0; maildir[i]; i++) { +> + if (maildir[i] == '.') +> + maildir[i] = '/'; +> + } +> + +> + /* If there's no string left, we're the "INBOX" */ +> + if (maildir[0] == '\0') +> + maildir = talloc_strdup(local, "INBOX"); +> + +> + _notmuch_message_add_term (message, "maildir", maildir); +> + +> talloc_free (local); +> +> return NOTMUCH_STATUS_SUCCESS; +> _______________________________________________ +> notmuch mailing list +> notmuch@notmuchmail.org +> http://notmuchmail.org/mailman/listinfo/notmuch