1 Return-Path: <amdragon@mit.edu>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 21291431FAF
\r
6 for <notmuch@notmuchmail.org>; Thu, 3 Jan 2013 23:27:04 -0800 (PST)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=1.151 tagged_above=-999 required=5
\r
12 tests=[FUZZY_AMBIEN=1.851, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled
\r
13 Received: from olra.theworths.org ([127.0.0.1])
\r
14 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
15 with ESMTP id NSsibVpNViL2 for <notmuch@notmuchmail.org>;
\r
16 Thu, 3 Jan 2013 23:27:02 -0800 (PST)
\r
17 Received: from dmz-mailsec-scanner-8.mit.edu (DMZ-MAILSEC-SCANNER-8.MIT.EDU
\r
19 by olra.theworths.org (Postfix) with ESMTP id 07464431FAE
\r
20 for <notmuch@notmuchmail.org>; Thu, 3 Jan 2013 23:27:01 -0800 (PST)
\r
21 X-AuditID: 12074425-b7ff26d000007f8d-66-50e6844567e2
\r
22 Received: from mailhub-auth-1.mit.edu ( [18.9.21.35])
\r
23 by dmz-mailsec-scanner-8.mit.edu (Symantec Messaging Gateway) with SMTP
\r
24 id BD.1F.32653.54486E05; Fri, 4 Jan 2013 02:27:01 -0500 (EST)
\r
25 Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103])
\r
26 by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id r047R045017065;
\r
27 Fri, 4 Jan 2013 02:27:00 -0500
\r
28 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91])
\r
29 (authenticated bits=0)
\r
30 (User authenticated as amdragon@ATHENA.MIT.EDU)
\r
31 by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id r047Qsxe000343
\r
32 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT);
\r
33 Fri, 4 Jan 2013 02:26:55 -0500 (EST)
\r
34 Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80)
\r
35 (envelope-from <amdragon@MIT.EDU>)
\r
36 id 1Tr1g6-0001lT-2H; Fri, 04 Jan 2013 02:26:54 -0500
\r
37 Date: Fri, 4 Jan 2013 02:26:53 -0500
\r
38 From: Austin Clements <amdragon@MIT.EDU>
\r
39 To: Jani Nikula <jani@nikula.org>
\r
40 Subject: Re: [PATCH v4 1/5] util: Factor out boolean term quoting routine
\r
41 Message-ID: <20130104072653.GJ17581@mit.edu>
\r
42 References: <1356936162-2589-1-git-send-email-amdragon@mit.edu>
\r
43 <1356936162-2589-2-git-send-email-amdragon@mit.edu>
\r
44 <87y5gagqkp.fsf@nikula.org>
\r
46 Content-Type: text/plain; charset=us-ascii
\r
47 Content-Disposition: inline
\r
48 In-Reply-To: <87y5gagqkp.fsf@nikula.org>
\r
49 User-Agent: Mutt/1.5.21 (2010-09-15)
\r
50 X-Brightmail-Tracker:
\r
51 H4sIAAAAAAAAA+NgFlrLKsWRmVeSWpSXmKPExsUixCmqrOva8izA4Hy3vMWN1m5Gi6bpzhar
\r
52 5/JYXL85k9nizcp5rA6sHjtn3WX3OPx1IYvHrfuv2T2erbrF7LHl0HvmANYoLpuU1JzMstQi
\r
53 fbsEroz596ewFjy3r5g8r4G1gbFNr4uRk0NCwESi610vG4QtJnHh3nogm4tDSGAfo0Tj74WM
\r
54 EM56Rom+h31QzgUmiQvvV7JCOEsYJV707mcG6WcRUJGY/XAdK4jNJqAhsW3/ckYQW0RAUWLz
\r
55 yf1gNrNAmsT331PAaoQFPCWun9oNZvMK6Ehc3XMLavdkRokrvacZIRKCEidnPmGBaNaSuPHv
\r
56 JVMXIweQLS2x/B8HSJgTaFf38tVgc0SBbphychvbBEahWUi6ZyHpnoXQvYCReRWjbEpulW5u
\r
57 YmZOcWqybnFyYl5eapGuhV5uZoleakrpJkZQTLC7qO5gnHBI6RCjAAejEg/vhHtPA4RYE8uK
\r
58 K3MPMUpyMCmJ8ro3PQsQ4kvKT6nMSCzOiC8qzUktPsQowcGsJML7WRsox5uSWFmVWpQPk5Lm
\r
59 YFES572RctNfSCA9sSQ1OzW1ILUIJivDwaEkwXsJZKhgUWp6akVaZk4JQpqJgxNkOA/Q8Ncg
\r
60 NbzFBYm5xZnpEPlTjLocDS9vPGUUYsnLz0uVEuc9CVIkAFKUUZoHNweWyl4xigO9Jcz7FqSK
\r
61 B5gG4Sa9AlrCBLTk1ZvHIEtKEhFSUg2M2arBVcoWQlekOae93XRn7QY5vt9LbS7uyBGZ49D6
\r
62 4lQHw/Hrn/64vd96N/5C0ZPuSQ4hU+Jz5jrWn3w+z/f1C7uZDx9ozmNoTZjxd9kT+WydD7nv
\r
63 Z6dKlL9UEFbk6/h/WyBr5sttvfHnrR4t57jIOCu/c629kfi7Myc2/XJW3Cgr+veQ7tn6b0os
\r
64 xRmJhlrMRcWJAN8QOmlAAwAA
\r
65 Cc: tomi.ollila@iki.fi, notmuch@notmuchmail.org
\r
66 X-BeenThere: notmuch@notmuchmail.org
\r
67 X-Mailman-Version: 2.1.13
\r
69 List-Id: "Use and development of the notmuch mail system."
\r
70 <notmuch.notmuchmail.org>
\r
71 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
72 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
73 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
74 List-Post: <mailto:notmuch@notmuchmail.org>
\r
75 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
76 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
77 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
78 X-List-Received-Date: Fri, 04 Jan 2013 07:27:04 -0000
\r
80 Quoth Jani Nikula on Jan 03 at 5:48 pm:
\r
81 > On Mon, 31 Dec 2012, Austin Clements <amdragon@MIT.EDU> wrote:
\r
82 > > From: Austin Clements <amdragon@MIT.EDU>
\r
84 > > This is now a generic boolean term quoting function. It performs
\r
85 > > minimal quoting to produce user-friendly queries.
\r
87 > > This could live in tag-util as well, but it is really nothing specific
\r
88 > > to tags (although the conventions are specific to Xapian).
\r
90 > > The API is changed from "caller-allocates" to "readline-like". The
\r
91 > > scan for max tag length is pushed down into the quoting routine.
\r
92 > > Furthermore, this now combines the term prefix with the quoted term;
\r
93 > > arguably this is just as easy to do in the caller, but this will
\r
94 > > nicely parallel the boolean term parsing function to be introduced
\r
97 > > This is an amalgamation of code written by David Bremner and myself.
\r
99 > > notmuch-tag.c | 48 ++++++++++++---------------------------
\r
100 > > util/string-util.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++
\r
101 > > util/string-util.h | 14 ++++++++++++
\r
102 > > 3 files changed, 92 insertions(+), 34 deletions(-)
\r
104 > > diff --git a/notmuch-tag.c b/notmuch-tag.c
\r
105 > > index 88d559b..fc9d43a 100644
\r
106 > > --- a/notmuch-tag.c
\r
107 > > +++ b/notmuch-tag.c
\r
108 > > @@ -19,6 +19,7 @@
\r
111 > > #include "notmuch-client.h"
\r
112 > > +#include "string-util.h"
\r
114 > > static volatile sig_atomic_t interrupted;
\r
116 > > @@ -35,25 +36,6 @@ handle_sigint (unused (int sig))
\r
117 > > interrupted = 1;
\r
121 > > -_escape_tag (char *buf, const char *tag)
\r
123 > > - const char *in = tag;
\r
124 > > - char *out = buf;
\r
126 > > - /* Boolean terms surrounded by double quotes can contain any
\r
127 > > - * character. Double quotes are quoted by doubling them. */
\r
128 > > - *out++ = '"';
\r
129 > > - while (*in) {
\r
130 > > - if (*in == '"')
\r
131 > > - *out++ = '"';
\r
132 > > - *out++ = *in++;
\r
134 > > - *out++ = '"';
\r
139 > > typedef struct {
\r
140 > > const char *tag;
\r
141 > > notmuch_bool_t remove;
\r
142 > > @@ -71,25 +53,16 @@ _optimize_tag_query (void *ctx, const char *orig_query_string,
\r
143 > > * parenthesize and the exclusion part of the query must not use
\r
144 > > * the '-' operator (though the NOT operator is fine). */
\r
146 > > - char *escaped, *query_string;
\r
147 > > + char *escaped = NULL;
\r
148 > > + size_t escaped_len = 0;
\r
149 > > + char *query_string;
\r
150 > > const char *join = "";
\r
152 > > - unsigned int max_tag_len = 0;
\r
155 > > /* Don't optimize if there are no tag changes. */
\r
156 > > if (tag_ops[0].tag == NULL)
\r
157 > > return talloc_strdup (ctx, orig_query_string);
\r
159 > > - /* Allocate a buffer for escaping tags. This is large enough to
\r
160 > > - * hold a fully escaped tag with every character doubled plus
\r
161 > > - * enclosing quotes and a NUL. */
\r
162 > > - for (i = 0; tag_ops[i].tag; i++)
\r
163 > > - if (strlen (tag_ops[i].tag) > max_tag_len)
\r
164 > > - max_tag_len = strlen (tag_ops[i].tag);
\r
165 > > - escaped = talloc_array (ctx, char, max_tag_len * 2 + 3);
\r
166 > > - if (! escaped)
\r
169 > > /* Build the new query string */
\r
170 > > if (strcmp (orig_query_string, "*") == 0)
\r
171 > > query_string = talloc_strdup (ctx, "(");
\r
172 > > @@ -97,10 +70,17 @@ _optimize_tag_query (void *ctx, const char *orig_query_string,
\r
173 > > query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string);
\r
175 > > for (i = 0; tag_ops[i].tag && query_string; i++) {
\r
176 > > + /* XXX in case of OOM, query_string will be deallocated when
\r
177 > > + * ctx is, which might be at shutdown */
\r
178 > > + if (make_boolean_term (ctx,
\r
179 > > + "tag", tag_ops[i].tag,
\r
180 > > + &escaped, &escaped_len))
\r
183 > > query_string = talloc_asprintf_append_buffer (
\r
184 > > - query_string, "%s%stag:%s", join,
\r
185 > > + query_string, "%s%s%s", join,
\r
186 > > tag_ops[i].remove ? "" : "not ",
\r
187 > > - _escape_tag (escaped, tag_ops[i].tag));
\r
192 > > diff --git a/util/string-util.c b/util/string-util.c
\r
193 > > index 44f8cd3..e4bea21 100644
\r
194 > > --- a/util/string-util.c
\r
195 > > +++ b/util/string-util.c
\r
196 > > @@ -20,6 +20,7 @@
\r
199 > > #include "string-util.h"
\r
200 > > +#include "talloc.h"
\r
203 > > strtok_len (char *s, const char *delim, size_t *len)
\r
204 > > @@ -32,3 +33,66 @@ strtok_len (char *s, const char *delim, size_t *len)
\r
206 > > return *len ? s : NULL;
\r
210 > > +make_boolean_term (void *ctx, const char *prefix, const char *term,
\r
211 > > + char **buf, size_t *len)
\r
213 > > + const char *in;
\r
215 > > + size_t needed = 3;
\r
216 > > + int need_quoting = 0;
\r
218 > > + /* Do we need quoting? To be paranoid, we quote anything
\r
219 > > + * containing a quote, even though it only matters at the
\r
220 > > + * beginning, and anything containing non-ASCII text. */
\r
221 > > + for (in = term; *in && !need_quoting; in++)
\r
222 > > + if (*in <= ' ' || *in == ')' || *in == '"' || (unsigned char)*in > 127)
\r
224 > Should that be *in >= 127?
\r
226 Nope. Character 127 is fine (and ASCII). Technically the only
\r
227 non-ASCII characters that require quoting are 0x201c and 0x201d, but
\r
228 rather than decoding UTF-8 to find those characters, it's much easier
\r
229 to just quote if there are any non-ASCII UTF-8 bytes. (Extra
\r
230 technically, we would be in real trouble if a tag contained 8-bit
\r
231 bytes but wasn't valid UTF-8; however, I think this would be the least
\r
238 > > + need_quoting = 1;
\r
240 > > + if (need_quoting)
\r
241 > > + for (in = term; *in; in++)
\r
242 > > + needed += (*in == '"') ? 2 : 1;
\r
244 > > + needed = strlen (term) + 1;
\r
246 > > + /* Reserve space for the prefix */
\r
248 > > + needed += strlen (prefix) + 1;
\r
250 > > + if ((*buf == NULL) || (needed > *len)) {
\r
251 > > + *len = 2 * needed;
\r
252 > > + *buf = talloc_realloc (ctx, *buf, char, *len);
\r
260 > > + /* Copy in the prefix */
\r
261 > > + if (prefix) {
\r
262 > > + strcpy (out, prefix);
\r
263 > > + out += strlen (prefix);
\r
264 > > + *out++ = ':';
\r
267 > > + if (! need_quoting) {
\r
268 > > + strcpy (out, term);
\r
272 > > + /* Quote term by enclosing it in double quotes and doubling any
\r
273 > > + * internal double quotes. */
\r
274 > > + *out++ = '"';
\r
276 > > + while (*in) {
\r
277 > > + if (*in == '"')
\r
278 > > + *out++ = '"';
\r
279 > > + *out++ = *in++;
\r
281 > > + *out++ = '"';
\r
286 > > diff --git a/util/string-util.h b/util/string-util.h
\r
287 > > index ac7676c..b8844a3 100644
\r
288 > > --- a/util/string-util.h
\r
289 > > +++ b/util/string-util.h
\r
290 > > @@ -19,4 +19,18 @@
\r
292 > > char *strtok_len (char *s, const char *delim, size_t *len);
\r
294 > > +/* Construct a boolean term query with the specified prefix (e.g.,
\r
295 > > + * "id") and search term, quoting term as necessary. Specifically, if
\r
296 > > + * term contains any non-printable ASCII characters, non-ASCII
\r
297 > > + * characters, close parenthesis or double quotes, it will be enclosed
\r
298 > > + * in double quotes and any internal double quotes will be doubled
\r
299 > > + * (e.g. a"b -> "a""b"). The result will be a valid notmuch query and
\r
300 > > + * can be parsed by parse_boolean_term.
\r
302 > > + * Output is into buf; it may be talloc_realloced.
\r
303 > > + * Return: 0 on success, non-zero on memory allocation failure.
\r
305 > > +int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term,
\r
306 > > + char **buf, size_t *len);
\r
311 Austin Clements MIT/'06/PhD/CSAIL
\r
312 amdragon@mit.edu http://web.mit.edu/amdragon
\r
313 Somewhere in the dream we call reality you will find me,
\r
314 searching for the reality we call dreams.
\r