From a1c0e6f9ecd00536ea95222eba2ec59ef2dafc18 Mon Sep 17 00:00:00 2001 From: Austin Clements Date: Fri, 4 Feb 2011 01:14:29 +1900 Subject: [PATCH] Folder search semantics (was Re: [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more) --- 3d/4701949c608decda984d69bdcab57c6d4f7029 | 161 ++++++++++++++++++++++ 1 file changed, 161 insertions(+) create mode 100644 3d/4701949c608decda984d69bdcab57c6d4f7029 diff --git a/3d/4701949c608decda984d69bdcab57c6d4f7029 b/3d/4701949c608decda984d69bdcab57c6d4f7029 new file mode 100644 index 000000000..13c0e6e0d --- /dev/null +++ b/3d/4701949c608decda984d69bdcab57c6d4f7029 @@ -0,0 +1,161 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id C951B431FD0 + for ; Wed, 2 Feb 2011 22:14:33 -0800 (PST) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: 0 +X-Spam-Level: +X-Spam-Status: No, score=0 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_NONE=-0.0001] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id ahCKLzqokZ5M for ; + Wed, 2 Feb 2011 22:14:32 -0800 (PST) +Received: from dmz-mailsec-scanner-5.mit.edu (DMZ-MAILSEC-SCANNER-5.MIT.EDU + [18.7.68.34]) + by olra.theworths.org (Postfix) with ESMTP id BE93B431FB5 + for ; Wed, 2 Feb 2011 22:14:32 -0800 (PST) +X-AuditID: 12074422-b7c3eae000000a70-41-4d4a47c8c503 +Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) + by dmz-mailsec-scanner-5.mit.edu (Symantec Brightmail Gateway) with + SMTP id F7.53.02672.8C74A4D4; Thu, 3 Feb 2011 01:14:32 -0500 (EST) +Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) + by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id p136EV2g030981; + Thu, 3 Feb 2011 01:14:31 -0500 +Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) + (authenticated bits=0) + (User authenticated as amdragon@ATHENA.MIT.EDU) + by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id p136ET2e001049 + (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); + Thu, 3 Feb 2011 01:14:29 -0500 (EST) +Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.72) + (envelope-from ) + id 1PksSb-0006Vo-3J; Thu, 03 Feb 2011 01:14:29 -0500 +Date: Thu, 3 Feb 2011 01:14:29 -0500 +From: Austin Clements +To: Carl Worth +Subject: Folder search semantics (was Re: [RFC PATCH v2 0/8] Custom query + parser, date search, folder search, and more) +Message-ID: <20110203061429.GD28537@mit.edu> +References: <1295165458-9573-1-git-send-email-amdragon@mit.edu> + <20110202050336.GB28537@mit.edu> + <87sjw6hx2l.fsf@yoom.home.cworth.org> +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +Content-Disposition: inline +In-Reply-To: <87sjw6hx2l.fsf@yoom.home.cworth.org> +User-Agent: Mutt/1.5.20 (2009-06-14) +X-Brightmail-Tracker: AAAAAA== +Cc: notmuch@notmuchmail.org +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Thu, 03 Feb 2011 06:14:34 -0000 + +Quoth Carl Worth on Feb 02 at 2:48 pm: +> Restricting my reply to one tiny bit of your mail: +> +> You wrote: +> > non-recursive is the only thing that makes sense for Maildir++ folders +> +> Either I'm not understanding Maildir++ folders, or I don't agree with +> you. +> +> I might have an email archive that looks like this: +> +> Maildir +> .work +> .project1 +> .project2 +> .etc... +> .family +> .dad +> .mom +> .brother +> .etc... +> +> With the above setup, what would be unreasonable about wanting to search +> for all work-related messages (across all projects, say) with a string +> like "folder:work" ? +> +> Now, a person might definitely want to search for messages in the +> ".work" folder directly, (not including the sub-folders), so we should +> provide support for users to get at that behavior as well, (such as a +> proposed "folder:work$" or so). +> +> To me, both cases are perfectly legitimate, and I don't understand an +> argument that claims that only one makes sense. (Or again, I may be +> misunderstanding something.) + +(Somebody with more first-hand Maildir++ experience should jump in here. +I stopped using Maildir++ a long time ago, so I may have no idea what +I'm talking about.) + +Both cases are perfectly legitimate. + +However, the issue with Maildir++ is that the inbox is stored in the +top-level directory: + + Maildir + cur + new + tmp + .work + .work.project1 + +As a consequence, all folders are subfolders of the inbox. With +recursive search, a search for your inbox folder returns *all* of your +messages. I wasn't trying to say that we shouldn't support recursive +search (I'm all for flexibility), but it's a confusing default for +Maildir++ because of this. + +Maildir++ has the added twist that the inbox folder has no name. As a +result, currently notmuch can't search for a Maildir++ inbox folder, +which needs to be addressed somehow. The least surprising approach +would compatibility with the Maildir++ convention of calling the +top-level folder INBOX, the subfolder INBOX.work, etc. + + +Maildir++ issues aside, I submit that rooted, non-recursive folder +searches are a more natural default with a more conventional syntactic +extension to non-rooted/recursive searches. In +id:87aaiy3u65.fsf@yoom.home.cworth.org, you mentioned that you +implemented non-rooted folder search to mimic subject search. But file +system paths are not natural language like subject lines. File system +paths are hierarchical and rooted. + +Of course, special query operators like ^ and $ can mitigate this, but +these queries *aren't* regexps and, furthermore, people don't usually +apply regexps to file names. They apply globs. Glob syntax has the +added benefit of congruity with Xapian wildcard syntax. This naturally +leads to a rooted, non-recursive syntax by default (like globs), where a +* at the end means recursive and a * at the beginning means non-rooted. +In fact, we could easily generalize this to arbitrary shell globs. + + +Here's a proposal that, I think, addresses Maildir++ inboxes and +subfolders; rooted, non-rooted, recursive, and non-recursive queries; +and then some. Plus, it wouldn't require many code changes; you've +already done the hard work. + +Switch XFOLDER from a probabilistic prefix with word-splitting to a +boolean prefix without word-splitting. When indexing, strip off the cur +or new and examine the resulting directory name. If it's the mail root, +this is a Maildir++ inbox, so add the term XFOLDERINBOX. If it starts +with a dot, it's a Maildir++ subfolder, so add the term +XFOLDERINBOX<.dirname>. Otherwise, add the term XFOLDER. +Then, using a custom query transform for the "folder:" prefix, enumerate +XFOLDER terms and form a synonym query out of those that fnmatch the +user's folder query. -- 2.26.2