1 Return-Path: <aclements@csail.mit.edu>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by arlo.cworth.org (Postfix) with ESMTP id 6EB466DE01F7
\r
6 for <notmuch@notmuchmail.org>; Mon, 6 Jun 2016 13:09:26 -0700 (PDT)
\r
7 X-Virus-Scanned: Debian amavisd-new at cworth.org
\r
11 X-Spam-Status: No, score=-0.823 tagged_above=-999 required=5
\r
12 tests=[AWL=-0.813, HTML_MESSAGE=0.001, SPF_PASS=-0.001,
\r
13 T_RP_MATCHES_RCVD=-0.01] autolearn=disabled
\r
14 Received: from arlo.cworth.org ([127.0.0.1])
\r
15 by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024)
\r
16 with ESMTP id Ti-kiAZDuIZc for <notmuch@notmuchmail.org>;
\r
17 Mon, 6 Jun 2016 13:09:17 -0700 (PDT)
\r
18 X-Greylist: delayed 2913 seconds by postgrey-1.35 at arlo;
\r
19 Mon, 06 Jun 2016 13:08:57 PDT
\r
20 Received: from outgoing-tmp.csail.mit.edu (outgoing-tmp.csail.mit.edu
\r
22 by arlo.cworth.org (Postfix) with ESMTP id 063316DE0217
\r
23 for <notmuch@notmuchmail.org>; Mon, 6 Jun 2016 13:08:56 -0700 (PDT)
\r
24 Received: from mail-yw0-f173.google.com ([209.85.161.173])
\r
25 by outgoing-tmp.csail.mit.edu with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128)
\r
26 (Exim 4.82) (envelope-from <aclements@csail.mit.edu>)
\r
28 for notmuch@notmuchmail.org; Mon, 06 Jun 2016 15:20:21 -0400
\r
29 Received: by mail-yw0-f173.google.com with SMTP id c127so149455948ywb.1
\r
30 for <notmuch@notmuchmail.org>; Mon, 06 Jun 2016 12:20:21 -0700 (PDT)
\r
31 X-Gm-Message-State: ALyK8tJzsZjhatT/stPriuVDauiazZsgnRt3IN/TeSV3HzCqPCyvNkrLFljgQTSX0n/Vnw05iAGS63zbY6Cfqg==
\r
32 X-Received: by 10.129.45.196 with SMTP id t187mr13435296ywt.153.1465240820424;
\r
33 Mon, 06 Jun 2016 12:20:20 -0700 (PDT)
\r
35 Received: by 10.37.200.7 with HTTP; Mon, 6 Jun 2016 12:20:19 -0700 (PDT)
\r
36 In-Reply-To: <878tyins3j.fsf@tesseract.cs.unb.ca>
\r
37 References: <1465196150-astroid-3-33kf2otxir-16915@strange>
\r
38 <87lh2ijxor.fsf@tesseract.cs.unb.ca>
\r
39 <1465217156-astroid-4-8l08w9cils-2318@strange>
\r
40 <877fe2tiy8.fsf@uwaterloo.ca> <878tyins3j.fsf@tesseract.cs.unb.ca>
\r
41 From: Austin Clements <aclements@csail.mit.edu>
\r
42 Date: Mon, 6 Jun 2016 15:20:19 -0400
\r
43 X-Gmail-Original-Message-ID:
\r
44 <CAH-f9WtC6CeVecfg8wFZUVc8K2rUfzsP72xo97sJX2y_mLW6-g@mail.gmail.com>
\r
46 <CAH-f9WtC6CeVecfg8wFZUVc8K2rUfzsP72xo97sJX2y_mLW6-g@mail.gmail.com>
\r
47 Subject: Re: searching: '*analysis' vs 'reanalysis'
\r
48 To: David Bremner <david@tethera.net>
\r
49 Cc: sfischme@uwaterloo.ca, Gaute Hope <eg@gaute.vetsj.com>,
\r
50 notmuch <notmuch@notmuchmail.org>
\r
51 Content-Type: multipart/alternative; boundary=001a1141df549cccb00534a0f6b3
\r
52 X-BeenThere: notmuch@notmuchmail.org
\r
53 X-Mailman-Version: 2.1.20
\r
55 List-Id: "Use and development of the notmuch mail system."
\r
56 <notmuch.notmuchmail.org>
\r
57 List-Unsubscribe: <https://notmuchmail.org/mailman/options/notmuch>,
\r
58 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
59 List-Archive: <http://notmuchmail.org/pipermail/notmuch/>
\r
60 List-Post: <mailto:notmuch@notmuchmail.org>
\r
61 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
62 List-Subscribe: <https://notmuchmail.org/mailman/listinfo/notmuch>,
\r
63 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
64 X-List-Received-Date: Mon, 06 Jun 2016 20:09:26 -0000
\r
66 --001a1141df549cccb00534a0f6b3
\r
67 Content-Type: text/plain; charset=UTF-8
\r
69 On Mon, Jun 6, 2016 at 1:29 PM, David Bremner <david@tethera.net> wrote:
\r
71 > Sebastian Fischmeister <sfischme@uwaterloo.ca> writes:
\r
74 > > I ran into this problem before as well. Storage is cheap. Notmuch could
\r
75 > > index all emails with reversed text to get around some of this
\r
76 > > problem. It doesn't solve the problem of *analysis*, but it's still an
\r
79 > It would probably be more useful to have brute force regexp searches on
\r
80 > headers. Austin did some experiments that sounded promising, where you
\r
81 > basically postprocess the result of a xapian query with a regexp. OTOH,
\r
82 > I don't know what kept him from proposing this for mainline. If it was
\r
83 > just parser issues, those are probably more or less solved now, at least
\r
84 > for people using xapian 1.3+
\r
87 The experiment was specifically for regexp matching subject, but it should
\r
88 work for any header we store a literal copy of in the database. The code is
\r
89 here, though in its current form it builds on my custom query parser:
\r
90 https://github.com/aclements/notmuch/commit/ce41b29aba4d9b84e2f1eb6ed8df67065196c960.
\r
91 Based on my understanding of Xapian 1.3+ field processors, these days it
\r
92 should be quite easy to hook the PostingSource in that commit into the
\r
93 Xapian QueryProcessor.
\r
95 --001a1141df549cccb00534a0f6b3
\r
96 Content-Type: text/html; charset=UTF-8
\r
97 Content-Transfer-Encoding: quoted-printable
\r
99 <div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On M=
\r
100 on, Jun 6, 2016 at 1:29 PM, David Bremner <span dir=3D"ltr"><<a href=3D"=
\r
101 mailto:david@tethera.net" target=3D"_blank">david@tethera.net</a>></span=
\r
102 > wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0=
\r
103 .8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(20=
\r
104 4,204,204);padding-left:1ex">Sebastian Fischmeister <<a href=3D"mailto:s=
\r
105 fischme@uwaterloo.ca">sfischme@uwaterloo.ca</a>> writes:<br>
\r
108 > I ran into this problem before as well. Storage is cheap. Notmuch coul=
\r
110 > index all emails with reversed text to get around some of this<br>
\r
111 > problem. It doesn't solve the problem of *analysis*, but it's =
\r
113 > improvement.<br>
\r
115 It would probably be more useful to have brute force regexp searches on<br>
\r
116 headers.=C2=A0 Austin did some experiments that sounded promising, where yo=
\r
118 basically postprocess the result of a xapian query with a regexp. OTOH,<br>
\r
119 I don't know what kept him from proposing this for mainline. If it was<=
\r
121 just parser issues, those are probably more or less solved now, at least<br=
\r
123 for people using xapian 1.3+<br></blockquote><div><br></div><div>The experi=
\r
124 ment was specifically for regexp matching subject, but it should work for a=
\r
125 ny header we store a literal copy of in the database. The code is here, tho=
\r
126 ugh in its current form it builds on my custom query parser:=C2=A0<a href=
\r
127 =3D"https://github.com/aclements/notmuch/commit/ce41b29aba4d9b84e2f1eb6ed8d=
\r
128 f67065196c960">https://github.com/aclements/notmuch/commit/ce41b29aba4d9b84=
\r
129 e2f1eb6ed8df67065196c960</a>. Based on my understanding of Xapian 1.3+ fiel=
\r
130 d processors, these days it should be quite easy to hook the PostingSource =
\r
131 in that commit into the Xapian QueryProcessor.</div></div></div></div>
\r
133 --001a1141df549cccb00534a0f6b3--
\r