1 Return-Path: <jani@nikula.org>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id DADF0431FAF
\r
6 for <notmuch@notmuchmail.org>; Thu, 3 Jan 2013 09:09:19 -0800 (PST)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5
\r
12 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled
\r
13 Received: from olra.theworths.org ([127.0.0.1])
\r
14 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
15 with ESMTP id mZVqrZHqFyow for <notmuch@notmuchmail.org>;
\r
16 Thu, 3 Jan 2013 09:09:19 -0800 (PST)
\r
17 Received: from mail-we0-f179.google.com (mail-we0-f179.google.com
\r
18 [74.125.82.179]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
\r
19 (No client certificate requested)
\r
20 by olra.theworths.org (Postfix) with ESMTPS id 20B09431FAE
\r
21 for <notmuch@notmuchmail.org>; Thu, 3 Jan 2013 09:09:19 -0800 (PST)
\r
22 Received: by mail-we0-f179.google.com with SMTP id r6so7149381wey.38
\r
23 for <notmuch@notmuchmail.org>; Thu, 03 Jan 2013 09:09:17 -0800 (PST)
\r
24 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
\r
25 d=google.com; s=20120113;
\r
26 h=x-received:from:to:cc:subject:in-reply-to:references:user-agent
\r
27 :date:message-id:mime-version:content-type:x-gm-message-state;
\r
28 bh=e4PnHTCba8bM8wgutAbPE19VYXrJzHa/EWjghj7pLy4=;
\r
29 b=dxR5ygrgCQZZ3FhhYoQTPSH5mLH5xO+4sjidSEUvxm+q1b2OwTK7xItJAAXtsGSmsI
\r
30 C90xFVk92c1mK56jX0iVC3WzZeuy/NusjK5HWokh/WspZJdVZvX6mZOM2PvkHghdG7Nx
\r
31 AXN3jxJ21vxSWDFgKnNDJJzKDM+FZkuIKbF05VVbGBaRYby2Ho/A4bcjPx+puAubDOmF
\r
32 5L1ZEihNEHeNllMz1gWVweqNLEBoxccbBCnh6dHEY+35yaPtbI08CzQiw2TzgnxKdL/P
\r
33 3d0kuMhDrVpEzVjHBhI1DKA49qbOi6PXFMEzlHbrsAytL51FJVH8uz2KieOU1LbJkbg3
\r
35 X-Received: by 10.180.88.138 with SMTP id bg10mr76204269wib.13.1357232956539;
\r
36 Thu, 03 Jan 2013 09:09:16 -0800 (PST)
\r
37 Received: from localhost ([2001:4b98:dc0:43:216:3eff:fe1b:25f3])
\r
38 by mx.google.com with ESMTPS id s10sm86244668wiw.4.2013.01.03.09.09.14
\r
39 (version=SSLv3 cipher=OTHER); Thu, 03 Jan 2013 09:09:15 -0800 (PST)
\r
40 From: Jani Nikula <jani@nikula.org>
\r
41 To: Austin Clements <amdragon@MIT.EDU>, notmuch@notmuchmail.org
\r
42 Subject: Re: [PATCH v4 2/5] util: Function to parse boolean term queries
\r
43 In-Reply-To: <1356936162-2589-3-git-send-email-amdragon@mit.edu>
\r
44 References: <1356936162-2589-1-git-send-email-amdragon@mit.edu>
\r
45 <1356936162-2589-3-git-send-email-amdragon@mit.edu>
\r
46 User-Agent: Notmuch/0.14+235~gdaf492b (http://notmuchmail.org) Emacs/23.2.1
\r
47 (x86_64-pc-linux-gnu)
\r
48 Date: Thu, 03 Jan 2013 18:09:08 +0100
\r
49 Message-ID: <87vcbegpmz.fsf@nikula.org>
\r
51 Content-Type: text/plain; charset=us-ascii
\r
53 ALoCoQlLS1mRI4Jj6V15GKpL4ym1d3UPtpKWkZr4x/QMkVL6GVzhLhhw5wxQiNWa67J59wy+Xx5a
\r
54 Cc: tomi.ollila@iki.fi
\r
55 X-BeenThere: notmuch@notmuchmail.org
\r
56 X-Mailman-Version: 2.1.13
\r
58 List-Id: "Use and development of the notmuch mail system."
\r
59 <notmuch.notmuchmail.org>
\r
60 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
61 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
62 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
63 List-Post: <mailto:notmuch@notmuchmail.org>
\r
64 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
65 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
66 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
67 X-List-Received-Date: Thu, 03 Jan 2013 17:09:20 -0000
\r
69 On Mon, 31 Dec 2012, Austin Clements <amdragon@MIT.EDU> wrote:
\r
70 > This parses the subset of Xapian's boolean term quoting rules that are
\r
71 > used by make_boolean_term. This is provided as a generic string
\r
72 > utility, but will be used shortly in notmuch restore to parse and
\r
73 > optimize for ID queries.
\r
75 > util/string-util.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
\r
76 > util/string-util.h | 15 ++++++++++++
\r
77 > 2 files changed, 82 insertions(+)
\r
79 > diff --git a/util/string-util.c b/util/string-util.c
\r
80 > index e4bea21..52c7781 100644
\r
81 > --- a/util/string-util.c
\r
82 > +++ b/util/string-util.c
\r
84 > #include "string-util.h"
\r
85 > #include "talloc.h"
\r
87 > +#include <ctype.h>
\r
90 > strtok_len (char *s, const char *delim, size_t *len)
\r
92 > @@ -96,3 +98,68 @@ make_boolean_term (void *ctx, const char *prefix, const char *term,
\r
97 > +static const char*
\r
98 > +skip_space (const char *str)
\r
100 > + while (*str && isspace (*str))
\r
102 Pedantic: isspace ((unsigned char) *str)
\r
109 > +parse_boolean_term (void *ctx, const char *str,
\r
110 > + char **prefix_out, char **term_out)
\r
112 > + *prefix_out = *term_out = NULL;
\r
114 > + /* Parse prefix */
\r
115 > + str = skip_space (str);
\r
116 > + const char *pos = strchr (str, ':');
\r
119 if (! pos || pos == str) ?
\r
123 Could just return 1 here.
\r
125 > + *prefix_out = talloc_strndup (ctx, str, pos - str);
\r
128 > + /* Implement de-quoting compatible with make_boolean_term. */
\r
129 > + if (*pos == '"') {
\r
130 > + char *out = talloc_array (ctx, char, strlen (pos));
\r
131 > + int closed = 0;
\r
132 > + *term_out = out;
\r
133 > + /* Skip the opening quote, find the closing quote, and
\r
134 > + * un-double doubled internal quotes. */
\r
135 > + for (++pos; *pos; ) {
\r
136 > + if (*pos == '"') {
\r
138 > + if (*pos != '"') {
\r
139 > + /* Found the closing quote. */
\r
141 > + pos = skip_space (pos);
\r
143 Is it necessary to accept trailing space?
\r
148 > + *out++ = *pos++;
\r
150 > + /* Did the term terminate without a closing quote or is there
\r
151 > + * trailing text after the closing quote? */
\r
152 > + if (!closed || *pos)
\r
156 > + const char *start = pos;
\r
157 > + /* Check for text after the boolean term. */
\r
158 > + while (*pos > ' ' && *pos != ')')
\r
160 The condition could have *pos there too for clarity, though not strictly
\r
161 necessary. Would be neat to have a ctype style helper that could be
\r
162 shared between this and make_boolean_term.
\r
165 > + if (*skip_space (pos))
\r
167 Is it necessary to accept trailing space?
\r
170 > + /* No trailing text; dup the string so the caller can free
\r
172 > + *term_out = talloc_strndup (ctx, start, pos - start);
\r
177 > + talloc_free (*prefix_out);
\r
178 > + talloc_free (*term_out);
\r
181 > diff --git a/util/string-util.h b/util/string-util.h
\r
182 > index b8844a3..8b9fe50 100644
\r
183 > --- a/util/string-util.h
\r
184 > +++ b/util/string-util.h
\r
185 > @@ -33,4 +33,19 @@ char *strtok_len (char *s, const char *delim, size_t *len);
\r
186 > int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term,
\r
187 > char **buf, size_t *len);
\r
189 > +/* Parse a boolean term query consisting of a prefix, a colon, and a
\r
190 > + * term that may be quoted as described for make_boolean_term. If the
\r
191 > + * term is not quoted, then it ends at the first whitespace or close
\r
192 > + * parenthesis. str may containing leading or trailing whitespace,
\r
193 > + * but anything else is considered a parse error. This is compatible
\r
194 > + * with anything produced by make_boolean_term, and supports a subset
\r
195 > + * of the quoting styles supported by Xapian (and hence notmuch).
\r
196 > + * *prefix_out and *term_out will be talloc'd with context ctx.
\r
198 > + * Return: 0 on success, non-zero on parse error.
\r
201 > +parse_boolean_term (void *ctx, const char *str,
\r
202 > + char **prefix_out, char **term_out);
\r