From: Todd Date: Sat, 17 Jan 2015 16:41:10 +0000 (+1800) Subject: Re: [PATCH v3 3/5] Add indexing for the mimetype term X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=c880f03c6e5c39ca207167a32899c4658b7bc8ff;p=notmuch-archives.git Re: [PATCH v3 3/5] Add indexing for the mimetype term --- diff --git a/34/e3f9eba8ed841f241dcdfebf19d7c6c657aa6a b/34/e3f9eba8ed841f241dcdfebf19d7c6c657aa6a new file mode 100644 index 000000000..5e939746b --- /dev/null +++ b/34/e3f9eba8ed841f241dcdfebf19d7c6c657aa6a @@ -0,0 +1,186 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id A977C431FC2 + for ; Sat, 17 Jan 2015 08:41:47 -0800 (PST) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: 2.438 +X-Spam-Level: ** +X-Spam-Status: No, score=2.438 tagged_above=-999 required=5 + tests=[DNS_FROM_AHBL_RHSBL=2.438] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id LuMEjxPDhkyZ for ; + Sat, 17 Jan 2015 08:41:44 -0800 (PST) +Received: from s75.web-hosting.com (s75.web-hosting.com [198.187.31.9]) + (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id 6847F431FAF + for ; Sat, 17 Jan 2015 08:41:44 -0800 (PST) +Received: from user-69-73-37-128.knology.net ([69.73.37.128]:46736 + helo=tz-lab) by server75.web-hosting.com with esmtpsa + (UNKNOWN:DHE-RSA-AES128-SHA:128) (Exim 4.82) (envelope-from + ) id 1YCWRS-001OHm-HT; Sat, 17 Jan 2015 11:41:42 + -0500 +From: Todd +To: David Bremner , notmuch@notmuchmail.org +Subject: Re: [PATCH v3 3/5] Add indexing for the mimetype term +In-Reply-To: <877fwlbfg1.fsf@maritornes.cs.unb.ca> +References: <1421368229-4360-1-git-send-email-todd@electricoding.com> + <1421368229-4360-3-git-send-email-todd@electricoding.com> + <877fwlbfg1.fsf@maritornes.cs.unb.ca> +User-Agent: Notmuch/0.19+17~gd8b219d (http://notmuchmail.org) Emacs/24.4.1 + (x86_64-unknown-linux-gnu) +Date: Sat, 17 Jan 2015 10:41:10 -0600 +Message-ID: <871tmt5pi1.fsf@electricoding.com> +MIME-Version: 1.0 +Content-Type: multipart/signed; boundary="=-=-="; + micalg=pgp-sha1; protocol="application/pgp-signature" +X-AntiAbuse: This header was added to track abuse, + please include it with any abuse report +X-AntiAbuse: Primary Hostname - server75.web-hosting.com +X-AntiAbuse: Original Domain - notmuchmail.org +X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] +X-AntiAbuse: Sender Address Domain - electricoding.com +X-Get-Message-Sender-Via: server75.web-hosting.com: authenticated_id: + todd@electricoding.com +X-Source: +X-Source-Args: +X-Source-Dir: +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sat, 17 Jan 2015 16:41:47 -0000 + +--=-=-= +Content-Type: text/plain +Content-Transfer-Encoding: quoted-printable + + +>>>>> "DB" =3D=3D David Bremner writes: + + DB> Todd writes: + >> Adds the indexing and removes the broken test flag + >> --- + >> lib/database.cc | 1 + + >> lib/index.cc | 10 ++++++++++ + >> test/T190-multipart.sh | 4 ---- + >> 3 files changed, 11 insertions(+), 4 deletions(-) + >> + >> diff --git a/lib/database.cc b/lib/database.cc + >> index 0d2c417..3974e2e 100644 + >> --- a/lib/database.cc + >> +++ b/lib/database.cc + >> @@ -254,6 +254,7 @@ static prefix_t PROBABILISTIC_PREFIX[]=3D { + >> { "from", "XFROM" }, + >> { "to", "XTO" }, + >> { "attachment", "XATTACHMENT" }, + >> + { "mimetype", "XMIMETYPE"}, + >> { "subject", "XSUBJECT"}, + >> }; + + DB> I think the commit message should articulate why we are indexing th= +is as + DB> a probabilistic prefix, rather than as a boolean prefix. In particu= +lar, + DB> this gives people a last chance to complain. + + DB> The reference I know is http://xapian.org/docs/queryparser.html + + DB> If I understand correctly (it would be great if you could test this + DB> Todd) , with a probabilistic prefix, + + DB> mimetime:pdf + + DB> will match + + DB> application/pdf + DB> image/pdf + DB> application/x-pdf + DB> application/x-ext-pdf + + DB> but not + + DB> application/x-bzpdf + DB> application/x-gzpdf + DB> application/x-xzpdf + + I just tested, and it does work this way with your examples. I + *believe* from reading the docs, that xapian is treating the full + MIME-type queries as phrase searches anyway due to the embedded + slashes. + + From http://xapian.org/docs/queryparser.html: + + A phrase surrounded with double quotes ("") matches documents + containing that exact phrase. Hyphenated words are also treated + as phrases, as are cases such as filenames and email addresses + (e.g. /etc/passwd or president@whitehouse.gov). + + I think that we'll get good behavior from the types of queries that + will typically be performed due to this automatic phrasing. + + + + DB> On the whole, this is probably more beneficial than bad. The downs= +ide + DB> of probabilistic prefixes/fields is that they are not "anchored", so + DB> there is no easy way to distinguish + + DB> application/pdf + + DB> from + + DB> pdf + DB> application/x-pdf + + DB> I guess in a perfect world this would also be explained in + DB> notmuch-search-terms(7), but that's pretty much orthogonal to this + DB> series. + + If separate messages with application/pdf and application/x-pdf are + indexed, then: +=20=20=20=20 + mimetype:application/x-pdf finds only the application/x-pdf + mimetype:application/pdf finds only the application/pdf + mimetype:pdf finds both of the messages + + I am fairly sure that this behaviour is a result of the automatic + phrasing mentioned above. + + - Todd +=20=20=20=20 + DB> d + +--=-=-= +Content-Type: application/pgp-signature; name="signature.asc" + +-----BEGIN PGP SIGNATURE----- +Version: GnuPG v1 + +iQIcBAEBAgAGBQJUupCnAAoJEEc0ULlfRYDu0f8QAJVtVpA9kQKjBgpTkrieYQnE +ADCWWrIwiI7rU8MyaWD5GqVBPVUdHvYaKCGoQhiirnqvNEk0CrsF4rrDB7UNcSVH +LKV5SDNIBGxw0EsMtukPXz0zgoJfKIWfqWieC97j832fI/2NZHetrs9VEWPHVLzJ +1VnPQpsAFt3dLXw8ff9WjkEZVcj/fbVBvHNZNX+YqY9RdzTRomJP4pqn0S1YKY9o +SohqbLpS7HVh7JFOdPMVyALOqs5dh44n0PJYe7FDazqNwb2w0PqEa2dQnHjGF/0e +8SRUSKCTpvYC9buRfcFmZj5KWGx/vgi9T17etXJYU2Vd/CQNPAZmliZS9gaYKlWt +8YasMJyDDRq79XmiFbJwao47HUig6IFBdgGCMVxzmUZPTlINO8lQyuP/O9DlHVo5 +2PK2vf/d07k5VnH6tjukEY6fEMQqQFkXG5JIWw0VLKMbVBG8esFwfpeEx0KdW6Qi +oJfHxjmHMfAug9L/lukHotW7fH3mHZ2RQLWClaqhVBGgeGRfyMJEjnbLVCiZlk/0 +0p4TDt5LTVAtopquwCMHpwJG7BA9CMOwGdOJB7hv/OTqVuj3ZSq1JP93jsrV4tO7 +azEYOYW/VnrsOoGmsW/K3Hggl2OYej9aYmugTw3fodU9RV+xmfSrvvU/qkKWMSel +oTv39uIcY/R+dmhU8EO1 +=5flc +-----END PGP SIGNATURE----- +--=-=-=--