Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id DADF0431FAF for ; Thu, 3 Jan 2013 09:09:19 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mZVqrZHqFyow for ; Thu, 3 Jan 2013 09:09:19 -0800 (PST) Received: from mail-we0-f179.google.com (mail-we0-f179.google.com [74.125.82.179]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 20B09431FAE for ; Thu, 3 Jan 2013 09:09:19 -0800 (PST) Received: by mail-we0-f179.google.com with SMTP id r6so7149381wey.38 for ; Thu, 03 Jan 2013 09:09:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:to:cc:subject:in-reply-to:references:user-agent :date:message-id:mime-version:content-type:x-gm-message-state; bh=e4PnHTCba8bM8wgutAbPE19VYXrJzHa/EWjghj7pLy4=; b=dxR5ygrgCQZZ3FhhYoQTPSH5mLH5xO+4sjidSEUvxm+q1b2OwTK7xItJAAXtsGSmsI C90xFVk92c1mK56jX0iVC3WzZeuy/NusjK5HWokh/WspZJdVZvX6mZOM2PvkHghdG7Nx AXN3jxJ21vxSWDFgKnNDJJzKDM+FZkuIKbF05VVbGBaRYby2Ho/A4bcjPx+puAubDOmF 5L1ZEihNEHeNllMz1gWVweqNLEBoxccbBCnh6dHEY+35yaPtbI08CzQiw2TzgnxKdL/P 3d0kuMhDrVpEzVjHBhI1DKA49qbOi6PXFMEzlHbrsAytL51FJVH8uz2KieOU1LbJkbg3 t/XQ== X-Received: by 10.180.88.138 with SMTP id bg10mr76204269wib.13.1357232956539; Thu, 03 Jan 2013 09:09:16 -0800 (PST) Received: from localhost ([2001:4b98:dc0:43:216:3eff:fe1b:25f3]) by mx.google.com with ESMTPS id s10sm86244668wiw.4.2013.01.03.09.09.14 (version=SSLv3 cipher=OTHER); Thu, 03 Jan 2013 09:09:15 -0800 (PST) From: Jani Nikula To: Austin Clements , notmuch@notmuchmail.org Subject: Re: [PATCH v4 2/5] util: Function to parse boolean term queries In-Reply-To: <1356936162-2589-3-git-send-email-amdragon@mit.edu> References: <1356936162-2589-1-git-send-email-amdragon@mit.edu> <1356936162-2589-3-git-send-email-amdragon@mit.edu> User-Agent: Notmuch/0.14+235~gdaf492b (http://notmuchmail.org) Emacs/23.2.1 (x86_64-pc-linux-gnu) Date: Thu, 03 Jan 2013 18:09:08 +0100 Message-ID: <87vcbegpmz.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Gm-Message-State: ALoCoQlLS1mRI4Jj6V15GKpL4ym1d3UPtpKWkZr4x/QMkVL6GVzhLhhw5wxQiNWa67J59wy+Xx5a Cc: tomi.ollila@iki.fi X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2013 17:09:20 -0000 On Mon, 31 Dec 2012, Austin Clements wrote: > This parses the subset of Xapian's boolean term quoting rules that are > used by make_boolean_term. This is provided as a generic string > utility, but will be used shortly in notmuch restore to parse and > optimize for ID queries. > --- > util/string-util.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > util/string-util.h | 15 ++++++++++++ > 2 files changed, 82 insertions(+) > > diff --git a/util/string-util.c b/util/string-util.c > index e4bea21..52c7781 100644 > --- a/util/string-util.c > +++ b/util/string-util.c > @@ -22,6 +22,8 @@ > #include "string-util.h" > #include "talloc.h" > > +#include > + > char * > strtok_len (char *s, const char *delim, size_t *len) > { > @@ -96,3 +98,68 @@ make_boolean_term (void *ctx, const char *prefix, const char *term, > > return 0; > } > + > +static const char* > +skip_space (const char *str) > +{ > + while (*str && isspace (*str)) Pedantic: isspace ((unsigned char) *str) > + ++str; > + return str; > +} > + > +int > +parse_boolean_term (void *ctx, const char *str, > + char **prefix_out, char **term_out) > +{ > + *prefix_out = *term_out = NULL; > + > + /* Parse prefix */ > + str = skip_space (str); > + const char *pos = strchr (str, ':'); > + if (! pos) if (! pos || pos == str) ? > + goto FAIL; Could just return 1 here. > + *prefix_out = talloc_strndup (ctx, str, pos - str); > + ++pos; > + > + /* Implement de-quoting compatible with make_boolean_term. */ > + if (*pos == '"') { > + char *out = talloc_array (ctx, char, strlen (pos)); > + int closed = 0; > + *term_out = out; > + /* Skip the opening quote, find the closing quote, and > + * un-double doubled internal quotes. */ > + for (++pos; *pos; ) { > + if (*pos == '"') { > + ++pos; > + if (*pos != '"') { > + /* Found the closing quote. */ > + closed = 1; > + pos = skip_space (pos); Is it necessary to accept trailing space? > + break; > + } > + } > + *out++ = *pos++; > + } > + /* Did the term terminate without a closing quote or is there > + * trailing text after the closing quote? */ > + if (!closed || *pos) > + goto FAIL; > + *out = '\0'; > + } else { > + const char *start = pos; > + /* Check for text after the boolean term. */ > + while (*pos > ' ' && *pos != ')') The condition could have *pos there too for clarity, though not strictly necessary. Would be neat to have a ctype style helper that could be shared between this and make_boolean_term. > + ++pos; > + if (*skip_space (pos)) Is it necessary to accept trailing space? > + goto FAIL; > + /* No trailing text; dup the string so the caller can free > + * it. */ > + *term_out = talloc_strndup (ctx, start, pos - start); > + } > + return 0; > + > + FAIL: > + talloc_free (*prefix_out); > + talloc_free (*term_out); > + return 1; > +} > diff --git a/util/string-util.h b/util/string-util.h > index b8844a3..8b9fe50 100644 > --- a/util/string-util.h > +++ b/util/string-util.h > @@ -33,4 +33,19 @@ char *strtok_len (char *s, const char *delim, size_t *len); > int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term, > char **buf, size_t *len); > > +/* Parse a boolean term query consisting of a prefix, a colon, and a > + * term that may be quoted as described for make_boolean_term. If the > + * term is not quoted, then it ends at the first whitespace or close > + * parenthesis. str may containing leading or trailing whitespace, > + * but anything else is considered a parse error. This is compatible > + * with anything produced by make_boolean_term, and supports a subset > + * of the quoting styles supported by Xapian (and hence notmuch). > + * *prefix_out and *term_out will be talloc'd with context ctx. > + * > + * Return: 0 on success, non-zero on parse error. > + */ > +int > +parse_boolean_term (void *ctx, const char *str, > + char **prefix_out, char **term_out); > + > #endif > -- > 1.7.10.4