From: David Bremner Date: Sat, 17 Jan 2015 15:21:50 +0000 (+0100) Subject: Re: [PATCH v3 3/5] Add indexing for the mimetype term X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=0bd26c2c425c48ea957e0fef4e35003d41a0ad1d;p=notmuch-archives.git Re: [PATCH v3 3/5] Add indexing for the mimetype term --- diff --git a/86/32211145b86c96fe70fc5ef42182c33f9f13b9 b/86/32211145b86c96fe70fc5ef42182c33f9f13b9 new file mode 100644 index 000000000..510208b32 --- /dev/null +++ b/86/32211145b86c96fe70fc5ef42182c33f9f13b9 @@ -0,0 +1,113 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id D8373431FC2 + for ; Sat, 17 Jan 2015 07:21:59 -0800 (PST) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: 2.438 +X-Spam-Level: ** +X-Spam-Status: No, score=2.438 tagged_above=-999 required=5 + tests=[DNS_FROM_AHBL_RHSBL=2.438] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id rDDm1QhRTpCf for ; + Sat, 17 Jan 2015 07:21:56 -0800 (PST) +Received: from yantan.tethera.net (yantan.tethera.net [199.188.72.155]) + (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id A5560431FAF + for ; Sat, 17 Jan 2015 07:21:56 -0800 (PST) +Received: from remotemail by yantan.tethera.net with local (Exim 4.80) + (envelope-from ) + id 1YCVCF-0007kL-AJ; Sat, 17 Jan 2015 11:21:55 -0400 +Received: (nullmailer pid 12523 invoked by uid 1000); Sat, 17 Jan 2015 + 15:21:50 -0000 +From: David Bremner +To: Todd , notmuch@notmuchmail.org +Subject: Re: [PATCH v3 3/5] Add indexing for the mimetype term +In-Reply-To: <1421368229-4360-3-git-send-email-todd@electricoding.com> +References: <1421368229-4360-1-git-send-email-todd@electricoding.com> + <1421368229-4360-3-git-send-email-todd@electricoding.com> +User-Agent: Notmuch/0.19+27~g29ffde4 (http://notmuchmail.org) Emacs/24.4.1 + (x86_64-pc-linux-gnu) +Date: Sat, 17 Jan 2015 16:21:50 +0100 +Message-ID: <877fwlbfg1.fsf@maritornes.cs.unb.ca> +MIME-Version: 1.0 +Content-Type: text/plain +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sat, 17 Jan 2015 15:22:00 -0000 + +Todd writes: + +> Adds the indexing and removes the broken test flag +> --- +> lib/database.cc | 1 + +> lib/index.cc | 10 ++++++++++ +> test/T190-multipart.sh | 4 ---- +> 3 files changed, 11 insertions(+), 4 deletions(-) +> +> diff --git a/lib/database.cc b/lib/database.cc +> index 0d2c417..3974e2e 100644 +> --- a/lib/database.cc +> +++ b/lib/database.cc +> @@ -254,6 +254,7 @@ static prefix_t PROBABILISTIC_PREFIX[]= { +> { "from", "XFROM" }, +> { "to", "XTO" }, +> { "attachment", "XATTACHMENT" }, +> + { "mimetype", "XMIMETYPE"}, +> { "subject", "XSUBJECT"}, +> }; + +I think the commit message should articulate why we are indexing this as +a probabilistic prefix, rather than as a boolean prefix. In particular, +this gives people a last chance to complain. + +The reference I know is http://xapian.org/docs/queryparser.html + +If I understand correctly (it would be great if you could test this +Todd) , with a probabilistic prefix, + + mimetime:pdf + +will match + +application/pdf +image/pdf +application/x-pdf +application/x-ext-pdf + +but not + +application/x-bzpdf +application/x-gzpdf +application/x-xzpdf + +On the whole, this is probably more beneficial than bad. The downside +of probabilistic prefixes/fields is that they are not "anchored", so +there is no easy way to distinguish + + application/pdf + +from + + pdf + application/x-pdf + +I guess in a perfect world this would also be explained in +notmuch-search-terms(7), but that's pretty much orthogonal to this +series. + +d