Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id A7C81431FBC for ; Wed, 23 Dec 2009 16:27:31 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id C2UDU+sN-uGv for ; Wed, 23 Dec 2009 16:27:28 -0800 (PST) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by olra.theworths.org (Postfix) with ESMTP id 825EE431FAE for ; Wed, 23 Dec 2009 16:27:28 -0800 (PST) Received: from list by lo.gmane.org with local (Exim 4.50) id 1NNbXx-0000D6-Tr for notmuch@notmuchmail.org; Thu, 24 Dec 2009 01:27:18 +0100 Received: from ip-118-90-130-94.xdsl.xnet.co.nz ([118.90.130.94]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 24 Dec 2009 01:27:17 +0100 Received: from olly by ip-118-90-130-94.xdsl.xnet.co.nz with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 24 Dec 2009 01:27:17 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: notmuch@notmuchmail.org From: Olly Betts Date: Thu, 24 Dec 2009 00:27:00 +0000 (UTC) Lines: 49 Message-ID: References: <3wdskb8oh77.fsf@testarossa.amd.com> <87hbroyyf6.fsf@yoom.home.cworth.org> <3wd637xo8oq.fsf@testarossa.amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: sea.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 118.90.130.94 (Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.9.1.6) Gecko/20091215 Ubuntu/9.10 (karmic) Firefox/3.5.6) Sender: news Subject: Re: [notmuch] Rather simple optimization for notmuch tag X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Dec 2009 00:27:31 -0000 Mark Anderson writes: > On Wed, 23 Dec 2009 03:45:14 +0000, Olly Betts wrote: > > Handling a combination of removals and additions is trickier, but probably > > possible, although the more tags you are dealing with, the less profitable > > the filtering is likely to be (as the filter is likely to cull fewer > > documents yet be more expensive to evaluate). > > But the transform is pretty simple, I think that any combination of > additions and removals could be transformed according to the following > formula. > > notmuch tag +a1 +a2 +a3 -d1 -d2 -d3 > > would transform to something like: > > and ( not(a1) or not(a2) or not(a3) or d1 or d2 or d3) Note that Xapian doesn't really have a "not" operator (because of how it works - by storing the documents indexing each term - rather than because nobody's implemented it), so it isn't quite as simple as the above. There is a posting list for "all documents" (which is very efficient if the document ids form a contiguous range; if they don't, it's as efficient as a term which matches all those documents for the chert backend, but not so great for the default flint backend in 1.0.x), and you can combine this with the "AND_NOT" operator to give the equivalent of a "NOT" operator. So I think the example above is probably best expressed as: ( AND ( ( ALL AND_NOT (a1 AND a2 AND a3) ) OR d1 OR d2 OR d3 ) But my point wasn't that I doubted it could be handled, but that it becomes less worthwhile as the number of tags increases (and at some point will become slower). > There are certainly may be much more optimal ways to do it depending on > the specific corpus of the database, considering if the tags a1 and a2 > and a3 are usually added as one tag, or if the addition is done > individually, because if I know that a3 implies a1 and a2, the first 3 > terms could be combined to not(a1 and a2 and a3), or I could just > exclude a3 tagged messages for nearly the same effect, with expected > performance improvements. I think you always can combine them like that. The documents that don't need looking at are precisely those which already have all three tags (i.e. a1 AND a2 AND a3), so those that do are "NOT" that expression. Cheers, Olly