1 Return-Path: <m.walters@qmul.ac.uk>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 53133431FB6
\r
6 for <notmuch@notmuchmail.org>; Mon, 31 Dec 2012 04:02:00 -0800 (PST)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=-1.098 tagged_above=-999 required=5
\r
12 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001,
\r
13 NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled
\r
14 Received: from olra.theworths.org ([127.0.0.1])
\r
15 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
16 with ESMTP id P1wJhFx4PT9u for <notmuch@notmuchmail.org>;
\r
17 Mon, 31 Dec 2012 04:01:56 -0800 (PST)
\r
18 Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6])
\r
19 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
\r
20 (No client certificate requested)
\r
21 by olra.theworths.org (Postfix) with ESMTPS id 5BA84431FAF
\r
22 for <notmuch@notmuchmail.org>; Mon, 31 Dec 2012 04:01:56 -0800 (PST)
\r
23 Received: from smtp.qmul.ac.uk ([138.37.6.40])
\r
24 by mail2.qmul.ac.uk with esmtp (Exim 4.71)
\r
25 (envelope-from <m.walters@qmul.ac.uk>)
\r
26 id 1Tpe3t-0005zv-BZ; Mon, 31 Dec 2012 12:01:45 +0000
\r
27 Received: from 188.31.19.240.threembb.co.uk ([188.31.19.240] helo=localhost)
\r
28 by smtp.qmul.ac.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69)
\r
29 (envelope-from <m.walters@qmul.ac.uk>)
\r
30 id 1Tpe3s-0007gY-Df; Mon, 31 Dec 2012 12:01:45 +0000
\r
31 From: Mark Walters <markwalters1009@gmail.com>
\r
32 To: Austin Clements <amdragon@MIT.EDU>, notmuch@notmuchmail.org
\r
33 Subject: Re: [PATCH v4 2/5] util: Function to parse boolean term queries
\r
34 In-Reply-To: <1356936162-2589-3-git-send-email-amdragon@mit.edu>
\r
35 References: <1356936162-2589-1-git-send-email-amdragon@mit.edu>
\r
36 <1356936162-2589-3-git-send-email-amdragon@mit.edu>
\r
37 User-Agent: Notmuch/0.14+236~g1d0044f (http://notmuchmail.org) Emacs/23.4.1
\r
38 (x86_64-pc-linux-gnu)
\r
39 Date: Mon, 31 Dec 2012 12:01:47 +0000
\r
40 Message-ID: <8738ymv39w.fsf@qmul.ac.uk>
\r
42 Content-Type: text/plain; charset=us-ascii
\r
43 X-Sender-Host-Address: 188.31.19.240
\r
44 X-QM-SPAM-Info: Sender has good ham record. :)
\r
45 X-QM-Body-MD5: 5f63b2a618e032e83994c2ba766fe734 (of first 20000 bytes)
\r
46 X-SpamAssassin-Score: -2.3
\r
47 X-SpamAssassin-SpamBar: --
\r
48 X-SpamAssassin-Report: The QM spam filters have analysed this message to
\r
50 spam. We require at least 5.0 points to mark a message as spam.
\r
51 This message scored -2.3 points.
\r
52 Summary of the scoring:
\r
53 * -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/,
\r
55 * [138.37.6.40 listed in list.dnswl.org]
\r
56 * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail
\r
57 provider * (markwalters1009[at]gmail.com)
\r
58 * -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
\r
60 X-QM-Scan-Virus: ClamAV says the message is clean
\r
61 Cc: tomi.ollila@iki.fi
\r
62 X-BeenThere: notmuch@notmuchmail.org
\r
63 X-Mailman-Version: 2.1.13
\r
65 List-Id: "Use and development of the notmuch mail system."
\r
66 <notmuch.notmuchmail.org>
\r
67 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
68 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
69 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
70 List-Post: <mailto:notmuch@notmuchmail.org>
\r
71 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
72 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
73 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
74 X-List-Received-Date: Mon, 31 Dec 2012 12:02:00 -0000
\r
76 On Mon, 31 Dec 2012, Austin Clements <amdragon@MIT.EDU> wrote:
\r
77 > This parses the subset of Xapian's boolean term quoting rules that are
\r
78 > used by make_boolean_term. This is provided as a generic string
\r
79 > utility, but will be used shortly in notmuch restore to parse and
\r
80 > optimize for ID queries.
\r
82 This looks good to me with one concern. Do you need to check that the
\r
83 three talloc allocations in parse_boolean_term succeed? I think they
\r
84 could fail in an OOM type situation which I think would cause a seg fault.
\r
86 I guess failures in restore are much less important than in dump, and I
\r
87 don't know how careful notmuch is in general.
\r
95 > util/string-util.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
\r
96 > util/string-util.h | 15 ++++++++++++
\r
97 > 2 files changed, 82 insertions(+)
\r
99 > diff --git a/util/string-util.c b/util/string-util.c
\r
100 > index e4bea21..52c7781 100644
\r
101 > --- a/util/string-util.c
\r
102 > +++ b/util/string-util.c
\r
103 > @@ -22,6 +22,8 @@
\r
104 > #include "string-util.h"
\r
105 > #include "talloc.h"
\r
107 > +#include <ctype.h>
\r
110 > strtok_len (char *s, const char *delim, size_t *len)
\r
112 > @@ -96,3 +98,68 @@ make_boolean_term (void *ctx, const char *prefix, const char *term,
\r
117 > +static const char*
\r
118 > +skip_space (const char *str)
\r
120 > + while (*str && isspace (*str))
\r
126 > +parse_boolean_term (void *ctx, const char *str,
\r
127 > + char **prefix_out, char **term_out)
\r
129 > + *prefix_out = *term_out = NULL;
\r
131 > + /* Parse prefix */
\r
132 > + str = skip_space (str);
\r
133 > + const char *pos = strchr (str, ':');
\r
136 > + *prefix_out = talloc_strndup (ctx, str, pos - str);
\r
139 > + /* Implement de-quoting compatible with make_boolean_term. */
\r
140 > + if (*pos == '"') {
\r
141 > + char *out = talloc_array (ctx, char, strlen (pos));
\r
142 > + int closed = 0;
\r
143 > + *term_out = out;
\r
144 > + /* Skip the opening quote, find the closing quote, and
\r
145 > + * un-double doubled internal quotes. */
\r
146 > + for (++pos; *pos; ) {
\r
147 > + if (*pos == '"') {
\r
149 > + if (*pos != '"') {
\r
150 > + /* Found the closing quote. */
\r
152 > + pos = skip_space (pos);
\r
156 > + *out++ = *pos++;
\r
158 > + /* Did the term terminate without a closing quote or is there
\r
159 > + * trailing text after the closing quote? */
\r
160 > + if (!closed || *pos)
\r
164 > + const char *start = pos;
\r
165 > + /* Check for text after the boolean term. */
\r
166 > + while (*pos > ' ' && *pos != ')')
\r
168 > + if (*skip_space (pos))
\r
170 > + /* No trailing text; dup the string so the caller can free
\r
172 > + *term_out = talloc_strndup (ctx, start, pos - start);
\r
177 > + talloc_free (*prefix_out);
\r
178 > + talloc_free (*term_out);
\r
181 > diff --git a/util/string-util.h b/util/string-util.h
\r
182 > index b8844a3..8b9fe50 100644
\r
183 > --- a/util/string-util.h
\r
184 > +++ b/util/string-util.h
\r
185 > @@ -33,4 +33,19 @@ char *strtok_len (char *s, const char *delim, size_t *len);
\r
186 > int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term,
\r
187 > char **buf, size_t *len);
\r
189 > +/* Parse a boolean term query consisting of a prefix, a colon, and a
\r
190 > + * term that may be quoted as described for make_boolean_term. If the
\r
191 > + * term is not quoted, then it ends at the first whitespace or close
\r
192 > + * parenthesis. str may containing leading or trailing whitespace,
\r
193 > + * but anything else is considered a parse error. This is compatible
\r
194 > + * with anything produced by make_boolean_term, and supports a subset
\r
195 > + * of the quoting styles supported by Xapian (and hence notmuch).
\r
196 > + * *prefix_out and *term_out will be talloc'd with context ctx.
\r
198 > + * Return: 0 on success, non-zero on parse error.
\r
201 > +parse_boolean_term (void *ctx, const char *str,
\r
202 > + char **prefix_out, char **term_out);
\r