1 Return-Path: <jani@nikula.org>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 4EE11431FAF
\r
6 for <notmuch@notmuchmail.org>; Thu, 3 Jan 2013 08:49:07 -0800 (PST)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=1.151 tagged_above=-999 required=5
\r
12 tests=[FUZZY_AMBIEN=1.851, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled
\r
13 Received: from olra.theworths.org ([127.0.0.1])
\r
14 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
15 with ESMTP id yDceBnu96li7 for <notmuch@notmuchmail.org>;
\r
16 Thu, 3 Jan 2013 08:49:06 -0800 (PST)
\r
17 Received: from mail-bk0-f53.google.com (mail-bk0-f53.google.com
\r
18 [209.85.214.53]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
\r
19 (No client certificate requested)
\r
20 by olra.theworths.org (Postfix) with ESMTPS id 3409C431FAE
\r
21 for <notmuch@notmuchmail.org>; Thu, 3 Jan 2013 08:49:06 -0800 (PST)
\r
22 Received: by mail-bk0-f53.google.com with SMTP id j5so6793968bkw.12
\r
23 for <notmuch@notmuchmail.org>; Thu, 03 Jan 2013 08:49:03 -0800 (PST)
\r
24 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
\r
25 d=google.com; s=20120113;
\r
26 h=x-received:from:to:cc:subject:in-reply-to:references:user-agent
\r
27 :date:message-id:mime-version:content-type:x-gm-message-state;
\r
28 bh=JT+aeu6kHcmbBMCWAjct13X/5hbEsPFm4KVnD83yHpo=;
\r
29 b=fnvnRcQm4q6rI7sPXgfoKJU69tQqGMeWWHCVpFkCSCtiMeEcjAG8Lyl9My/ZMwhdvf
\r
30 taIocJJ24meA2X86tYhHfOquDA3pQUh2J91xGpO30h8SJfiCFIEppgFH0bgGAS+JwGAe
\r
31 6U9BewuJRFWnXpGgNNYYCfxpscU68WnYYnoyH71SXNke6FBTuC768A8QdGZyRQrqUtPe
\r
32 FIyoARDD+8uBrVDCiXUl8jOhul3r5yktqK0N4d4641x9i1c2IbLeNhvZekM2dDqJLh4v
\r
33 i0vw0pyqyaxOE8B+Q0JGrHAMfSR0zIsLQaSJovUVBjcWhpj5yJKaSFy6TtZvlIL9Fi3e
\r
35 X-Received: by 10.204.3.220 with SMTP id 28mr23780642bko.50.1357231743438;
\r
36 Thu, 03 Jan 2013 08:49:03 -0800 (PST)
\r
37 Received: from localhost ([2001:4b98:dc0:43:216:3eff:fe1b:25f3])
\r
38 by mx.google.com with ESMTPS id o7sm34593411bkv.13.2013.01.03.08.49.00
\r
39 (version=SSLv3 cipher=OTHER); Thu, 03 Jan 2013 08:49:02 -0800 (PST)
\r
40 From: Jani Nikula <jani@nikula.org>
\r
41 To: Austin Clements <amdragon@MIT.EDU>, notmuch@notmuchmail.org
\r
42 Subject: Re: [PATCH v4 1/5] util: Factor out boolean term quoting routine
\r
43 In-Reply-To: <1356936162-2589-2-git-send-email-amdragon@mit.edu>
\r
44 References: <1356936162-2589-1-git-send-email-amdragon@mit.edu>
\r
45 <1356936162-2589-2-git-send-email-amdragon@mit.edu>
\r
46 User-Agent: Notmuch/0.14+235~gdaf492b (http://notmuchmail.org) Emacs/23.2.1
\r
47 (x86_64-pc-linux-gnu)
\r
48 Date: Thu, 03 Jan 2013 17:48:54 +0100
\r
49 Message-ID: <87y5gagqkp.fsf@nikula.org>
\r
51 Content-Type: text/plain; charset=us-ascii
\r
53 ALoCoQns7ri9as/bFmXiFxhT85GCq5nfW3PAwWJgOze5h1Lcy31e/6iRV9pgzjdMmCNys/Fm8UL+
\r
54 Cc: tomi.ollila@iki.fi
\r
55 X-BeenThere: notmuch@notmuchmail.org
\r
56 X-Mailman-Version: 2.1.13
\r
58 List-Id: "Use and development of the notmuch mail system."
\r
59 <notmuch.notmuchmail.org>
\r
60 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
61 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
62 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
63 List-Post: <mailto:notmuch@notmuchmail.org>
\r
64 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
65 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
66 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
67 X-List-Received-Date: Thu, 03 Jan 2013 16:49:07 -0000
\r
69 On Mon, 31 Dec 2012, Austin Clements <amdragon@MIT.EDU> wrote:
\r
70 > From: Austin Clements <amdragon@MIT.EDU>
\r
72 > This is now a generic boolean term quoting function. It performs
\r
73 > minimal quoting to produce user-friendly queries.
\r
75 > This could live in tag-util as well, but it is really nothing specific
\r
76 > to tags (although the conventions are specific to Xapian).
\r
78 > The API is changed from "caller-allocates" to "readline-like". The
\r
79 > scan for max tag length is pushed down into the quoting routine.
\r
80 > Furthermore, this now combines the term prefix with the quoted term;
\r
81 > arguably this is just as easy to do in the caller, but this will
\r
82 > nicely parallel the boolean term parsing function to be introduced
\r
85 > This is an amalgamation of code written by David Bremner and myself.
\r
87 > notmuch-tag.c | 48 ++++++++++++---------------------------
\r
88 > util/string-util.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++
\r
89 > util/string-util.h | 14 ++++++++++++
\r
90 > 3 files changed, 92 insertions(+), 34 deletions(-)
\r
92 > diff --git a/notmuch-tag.c b/notmuch-tag.c
\r
93 > index 88d559b..fc9d43a 100644
\r
94 > --- a/notmuch-tag.c
\r
95 > +++ b/notmuch-tag.c
\r
99 > #include "notmuch-client.h"
\r
100 > +#include "string-util.h"
\r
102 > static volatile sig_atomic_t interrupted;
\r
104 > @@ -35,25 +36,6 @@ handle_sigint (unused (int sig))
\r
109 > -_escape_tag (char *buf, const char *tag)
\r
111 > - const char *in = tag;
\r
112 > - char *out = buf;
\r
114 > - /* Boolean terms surrounded by double quotes can contain any
\r
115 > - * character. Double quotes are quoted by doubling them. */
\r
118 > - if (*in == '"')
\r
120 > - *out++ = *in++;
\r
129 > notmuch_bool_t remove;
\r
130 > @@ -71,25 +53,16 @@ _optimize_tag_query (void *ctx, const char *orig_query_string,
\r
131 > * parenthesize and the exclusion part of the query must not use
\r
132 > * the '-' operator (though the NOT operator is fine). */
\r
134 > - char *escaped, *query_string;
\r
135 > + char *escaped = NULL;
\r
136 > + size_t escaped_len = 0;
\r
137 > + char *query_string;
\r
138 > const char *join = "";
\r
140 > - unsigned int max_tag_len = 0;
\r
143 > /* Don't optimize if there are no tag changes. */
\r
144 > if (tag_ops[0].tag == NULL)
\r
145 > return talloc_strdup (ctx, orig_query_string);
\r
147 > - /* Allocate a buffer for escaping tags. This is large enough to
\r
148 > - * hold a fully escaped tag with every character doubled plus
\r
149 > - * enclosing quotes and a NUL. */
\r
150 > - for (i = 0; tag_ops[i].tag; i++)
\r
151 > - if (strlen (tag_ops[i].tag) > max_tag_len)
\r
152 > - max_tag_len = strlen (tag_ops[i].tag);
\r
153 > - escaped = talloc_array (ctx, char, max_tag_len * 2 + 3);
\r
157 > /* Build the new query string */
\r
158 > if (strcmp (orig_query_string, "*") == 0)
\r
159 > query_string = talloc_strdup (ctx, "(");
\r
160 > @@ -97,10 +70,17 @@ _optimize_tag_query (void *ctx, const char *orig_query_string,
\r
161 > query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string);
\r
163 > for (i = 0; tag_ops[i].tag && query_string; i++) {
\r
164 > + /* XXX in case of OOM, query_string will be deallocated when
\r
165 > + * ctx is, which might be at shutdown */
\r
166 > + if (make_boolean_term (ctx,
\r
167 > + "tag", tag_ops[i].tag,
\r
168 > + &escaped, &escaped_len))
\r
171 > query_string = talloc_asprintf_append_buffer (
\r
172 > - query_string, "%s%stag:%s", join,
\r
173 > + query_string, "%s%s%s", join,
\r
174 > tag_ops[i].remove ? "" : "not ",
\r
175 > - _escape_tag (escaped, tag_ops[i].tag));
\r
180 > diff --git a/util/string-util.c b/util/string-util.c
\r
181 > index 44f8cd3..e4bea21 100644
\r
182 > --- a/util/string-util.c
\r
183 > +++ b/util/string-util.c
\r
184 > @@ -20,6 +20,7 @@
\r
187 > #include "string-util.h"
\r
188 > +#include "talloc.h"
\r
191 > strtok_len (char *s, const char *delim, size_t *len)
\r
192 > @@ -32,3 +33,66 @@ strtok_len (char *s, const char *delim, size_t *len)
\r
194 > return *len ? s : NULL;
\r
198 > +make_boolean_term (void *ctx, const char *prefix, const char *term,
\r
199 > + char **buf, size_t *len)
\r
201 > + const char *in;
\r
203 > + size_t needed = 3;
\r
204 > + int need_quoting = 0;
\r
206 > + /* Do we need quoting? To be paranoid, we quote anything
\r
207 > + * containing a quote, even though it only matters at the
\r
208 > + * beginning, and anything containing non-ASCII text. */
\r
209 > + for (in = term; *in && !need_quoting; in++)
\r
210 > + if (*in <= ' ' || *in == ')' || *in == '"' || (unsigned char)*in > 127)
\r
212 Should that be *in >= 127?
\r
218 > + need_quoting = 1;
\r
220 > + if (need_quoting)
\r
221 > + for (in = term; *in; in++)
\r
222 > + needed += (*in == '"') ? 2 : 1;
\r
224 > + needed = strlen (term) + 1;
\r
226 > + /* Reserve space for the prefix */
\r
228 > + needed += strlen (prefix) + 1;
\r
230 > + if ((*buf == NULL) || (needed > *len)) {
\r
231 > + *len = 2 * needed;
\r
232 > + *buf = talloc_realloc (ctx, *buf, char, *len);
\r
240 > + /* Copy in the prefix */
\r
242 > + strcpy (out, prefix);
\r
243 > + out += strlen (prefix);
\r
247 > + if (! need_quoting) {
\r
248 > + strcpy (out, term);
\r
252 > + /* Quote term by enclosing it in double quotes and doubling any
\r
253 > + * internal double quotes. */
\r
257 > + if (*in == '"')
\r
259 > + *out++ = *in++;
\r
266 > diff --git a/util/string-util.h b/util/string-util.h
\r
267 > index ac7676c..b8844a3 100644
\r
268 > --- a/util/string-util.h
\r
269 > +++ b/util/string-util.h
\r
270 > @@ -19,4 +19,18 @@
\r
272 > char *strtok_len (char *s, const char *delim, size_t *len);
\r
274 > +/* Construct a boolean term query with the specified prefix (e.g.,
\r
275 > + * "id") and search term, quoting term as necessary. Specifically, if
\r
276 > + * term contains any non-printable ASCII characters, non-ASCII
\r
277 > + * characters, close parenthesis or double quotes, it will be enclosed
\r
278 > + * in double quotes and any internal double quotes will be doubled
\r
279 > + * (e.g. a"b -> "a""b"). The result will be a valid notmuch query and
\r
280 > + * can be parsed by parse_boolean_term.
\r
282 > + * Output is into buf; it may be talloc_realloced.
\r
283 > + * Return: 0 on success, non-zero on memory allocation failure.
\r
285 > +int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term,
\r
286 > + char **buf, size_t *len);
\r