Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id D2178431FB6 for ; Sat, 29 Dec 2012 13:58:32 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KXkyali8N-EF for ; Sat, 29 Dec 2012 13:58:31 -0800 (PST) Received: from guru.guru-group.fi (guru.guru-group.fi [46.183.73.34]) by olra.theworths.org (Postfix) with ESMTP id AEDA6431FAF for ; Sat, 29 Dec 2012 13:58:30 -0800 (PST) Received: from guru.guru-group.fi (localhost [IPv6:::1]) by guru.guru-group.fi (Postfix) with ESMTP id 114601000CA; Sat, 29 Dec 2012 23:58:22 +0200 (EET) From: Tomi Ollila To: Austin Clements , notmuch@notmuchmail.org Subject: Re: [PATCH v3 2/5] util: Function to parse boolean term queries In-Reply-To: <1356719189-2837-3-git-send-email-amdragon@mit.edu> References: <1356719189-2837-1-git-send-email-amdragon@mit.edu> <1356719189-2837-3-git-send-email-amdragon@mit.edu> User-Agent: Notmuch/0.14+216~gd58661f (http://notmuchmail.org) Emacs/24.2.1 (x86_64-unknown-linux-gnu) X-Face: HhBM'cA~ MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Dec 2012 21:58:33 -0000 On Fri, Dec 28 2012, Austin Clements wrote: > This parses the subset of Xapian's boolean term quoting rules that are > used by make_boolean_term. This is provided as a generic string > utility, but will be used shortly in notmuch restore to parse and > optimize for ID queries. > --- > util/string-util.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > util/string-util.h | 11 +++++++++++ > 2 files changed, 66 insertions(+) > > diff --git a/util/string-util.c b/util/string-util.c > index e4bea21..83b4953 100644 > --- a/util/string-util.c > +++ b/util/string-util.c > @@ -96,3 +96,58 @@ make_boolean_term (void *ctx, const char *prefix, const char *term, > > return 0; > } > + > +int > +parse_boolean_term (void *ctx, const char *str, > + char **prefix_out, char **term_out) > +{ > + *prefix_out = *term_out = NULL; > + > + /* Parse prefix */ > + const char *pos = strchr (str, ':'); > + if (! pos) > + goto FAIL; > + *prefix_out = talloc_strndup (ctx, str, pos - str); > + ++pos; > + > + /* Implement de-quoting compatible with make_boolean_term. */ > + if (*pos == '"') { > + char *out = talloc_array (ctx, char, strlen (pos)); > + int closed = 0; > + *term_out = out; > + /* Skip the opening quote, find the closing quote, and > + * un-double doubled internal quotes. */ > + for (++pos; *pos; ) { > + if (*pos == '"') { > + ++pos; > + if (*pos != '"') { > + /* Found the closing quote. */ > + closed = 1; > + break; > + } > + } > + *out++ = *pos++; > + } > + /* Did the term terminate without a closing quote or is there > + * trailing text after the closing quote? */ > + if (!closed || *pos) > + goto FAIL; > + *out = '\0'; > + } else { > + const char *start = pos; > + /* Check for text after the boolean term. */ > + while (*pos > ' ' && *pos != ')') > + ++pos; > + if (*pos) > + goto FAIL; Mark pointed out a good case about trailing whitespace -- It would be nice if the core were lenient for such cases. I personally remember once wasting hours of work by just failing to notice trailing whitespace in one system so this subject is sensitive to me... Another thing I saw earlyer today: make_boolean_term() checks if (*in <= ' ' || *in == ')' || *in == '"' || (unsigned char)*in > 127) but here the check is only while (*pos > ' ' && *pos != ')') I wonder whether it matters... Everyting else looks good to me. Tomi > + /* No trailing text; dup the string so the caller can free > + * it. */ > + *term_out = talloc_strdup (ctx, start); > + } > + return 0; > + > + FAIL: > + talloc_free (*prefix_out); > + talloc_free (*term_out); > + return 1; > +} > diff --git a/util/string-util.h b/util/string-util.h > index b8844a3..43d49d0 100644 > --- a/util/string-util.h > +++ b/util/string-util.h > @@ -33,4 +33,15 @@ char *strtok_len (char *s, const char *delim, size_t *len); > int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term, > char **buf, size_t *len); > > +/* Parse a boolean term query produced by make_boolean_term, returning > + * the prefix in *prefix_out and the term in *term_out. *prefix_out > + * and *term_out will be talloc'd with context ctx. > + * > + * Return: 0 on success, non-zero on parse error (including trailing > + * data in str). > + */ > +int > +parse_boolean_term (void *ctx, const char *str, > + char **prefix_out, char **term_out); > + > #endif > -- > 1.7.10.4 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch