Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id D8373431FC2 for ; Sat, 17 Jan 2015 07:21:59 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 2.438 X-Spam-Level: ** X-Spam-Status: No, score=2.438 tagged_above=-999 required=5 tests=[DNS_FROM_AHBL_RHSBL=2.438] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rDDm1QhRTpCf for ; Sat, 17 Jan 2015 07:21:56 -0800 (PST) Received: from yantan.tethera.net (yantan.tethera.net [199.188.72.155]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id A5560431FAF for ; Sat, 17 Jan 2015 07:21:56 -0800 (PST) Received: from remotemail by yantan.tethera.net with local (Exim 4.80) (envelope-from ) id 1YCVCF-0007kL-AJ; Sat, 17 Jan 2015 11:21:55 -0400 Received: (nullmailer pid 12523 invoked by uid 1000); Sat, 17 Jan 2015 15:21:50 -0000 From: David Bremner To: Todd , notmuch@notmuchmail.org Subject: Re: [PATCH v3 3/5] Add indexing for the mimetype term In-Reply-To: <1421368229-4360-3-git-send-email-todd@electricoding.com> References: <1421368229-4360-1-git-send-email-todd@electricoding.com> <1421368229-4360-3-git-send-email-todd@electricoding.com> User-Agent: Notmuch/0.19+27~g29ffde4 (http://notmuchmail.org) Emacs/24.4.1 (x86_64-pc-linux-gnu) Date: Sat, 17 Jan 2015 16:21:50 +0100 Message-ID: <877fwlbfg1.fsf@maritornes.cs.unb.ca> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Jan 2015 15:22:00 -0000 Todd writes: > Adds the indexing and removes the broken test flag > --- > lib/database.cc | 1 + > lib/index.cc | 10 ++++++++++ > test/T190-multipart.sh | 4 ---- > 3 files changed, 11 insertions(+), 4 deletions(-) > > diff --git a/lib/database.cc b/lib/database.cc > index 0d2c417..3974e2e 100644 > --- a/lib/database.cc > +++ b/lib/database.cc > @@ -254,6 +254,7 @@ static prefix_t PROBABILISTIC_PREFIX[]= { > { "from", "XFROM" }, > { "to", "XTO" }, > { "attachment", "XATTACHMENT" }, > + { "mimetype", "XMIMETYPE"}, > { "subject", "XSUBJECT"}, > }; I think the commit message should articulate why we are indexing this as a probabilistic prefix, rather than as a boolean prefix. In particular, this gives people a last chance to complain. The reference I know is http://xapian.org/docs/queryparser.html If I understand correctly (it would be great if you could test this Todd) , with a probabilistic prefix, mimetime:pdf will match application/pdf image/pdf application/x-pdf application/x-ext-pdf but not application/x-bzpdf application/x-gzpdf application/x-xzpdf On the whole, this is probably more beneficial than bad. The downside of probabilistic prefixes/fields is that they are not "anchored", so there is no easy way to distinguish application/pdf from pdf application/x-pdf I guess in a perfect world this would also be explained in notmuch-search-terms(7), but that's pretty much orthogonal to this series. d