Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 4EE11431FAF for ; Thu, 3 Jan 2013 08:49:07 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 1.151 X-Spam-Level: * X-Spam-Status: No, score=1.151 tagged_above=-999 required=5 tests=[FUZZY_AMBIEN=1.851, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yDceBnu96li7 for ; Thu, 3 Jan 2013 08:49:06 -0800 (PST) Received: from mail-bk0-f53.google.com (mail-bk0-f53.google.com [209.85.214.53]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 3409C431FAE for ; Thu, 3 Jan 2013 08:49:06 -0800 (PST) Received: by mail-bk0-f53.google.com with SMTP id j5so6793968bkw.12 for ; Thu, 03 Jan 2013 08:49:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:to:cc:subject:in-reply-to:references:user-agent :date:message-id:mime-version:content-type:x-gm-message-state; bh=JT+aeu6kHcmbBMCWAjct13X/5hbEsPFm4KVnD83yHpo=; b=fnvnRcQm4q6rI7sPXgfoKJU69tQqGMeWWHCVpFkCSCtiMeEcjAG8Lyl9My/ZMwhdvf taIocJJ24meA2X86tYhHfOquDA3pQUh2J91xGpO30h8SJfiCFIEppgFH0bgGAS+JwGAe 6U9BewuJRFWnXpGgNNYYCfxpscU68WnYYnoyH71SXNke6FBTuC768A8QdGZyRQrqUtPe FIyoARDD+8uBrVDCiXUl8jOhul3r5yktqK0N4d4641x9i1c2IbLeNhvZekM2dDqJLh4v i0vw0pyqyaxOE8B+Q0JGrHAMfSR0zIsLQaSJovUVBjcWhpj5yJKaSFy6TtZvlIL9Fi3e 9+bA== X-Received: by 10.204.3.220 with SMTP id 28mr23780642bko.50.1357231743438; Thu, 03 Jan 2013 08:49:03 -0800 (PST) Received: from localhost ([2001:4b98:dc0:43:216:3eff:fe1b:25f3]) by mx.google.com with ESMTPS id o7sm34593411bkv.13.2013.01.03.08.49.00 (version=SSLv3 cipher=OTHER); Thu, 03 Jan 2013 08:49:02 -0800 (PST) From: Jani Nikula To: Austin Clements , notmuch@notmuchmail.org Subject: Re: [PATCH v4 1/5] util: Factor out boolean term quoting routine In-Reply-To: <1356936162-2589-2-git-send-email-amdragon@mit.edu> References: <1356936162-2589-1-git-send-email-amdragon@mit.edu> <1356936162-2589-2-git-send-email-amdragon@mit.edu> User-Agent: Notmuch/0.14+235~gdaf492b (http://notmuchmail.org) Emacs/23.2.1 (x86_64-pc-linux-gnu) Date: Thu, 03 Jan 2013 17:48:54 +0100 Message-ID: <87y5gagqkp.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Gm-Message-State: ALoCoQns7ri9as/bFmXiFxhT85GCq5nfW3PAwWJgOze5h1Lcy31e/6iRV9pgzjdMmCNys/Fm8UL+ Cc: tomi.ollila@iki.fi X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2013 16:49:07 -0000 On Mon, 31 Dec 2012, Austin Clements wrote: > From: Austin Clements > > This is now a generic boolean term quoting function. It performs > minimal quoting to produce user-friendly queries. > > This could live in tag-util as well, but it is really nothing specific > to tags (although the conventions are specific to Xapian). > > The API is changed from "caller-allocates" to "readline-like". The > scan for max tag length is pushed down into the quoting routine. > Furthermore, this now combines the term prefix with the quoted term; > arguably this is just as easy to do in the caller, but this will > nicely parallel the boolean term parsing function to be introduced > shortly. > > This is an amalgamation of code written by David Bremner and myself. > --- > notmuch-tag.c | 48 ++++++++++++--------------------------- > util/string-util.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > util/string-util.h | 14 ++++++++++++ > 3 files changed, 92 insertions(+), 34 deletions(-) > > diff --git a/notmuch-tag.c b/notmuch-tag.c > index 88d559b..fc9d43a 100644 > --- a/notmuch-tag.c > +++ b/notmuch-tag.c > @@ -19,6 +19,7 @@ > */ > > #include "notmuch-client.h" > +#include "string-util.h" > > static volatile sig_atomic_t interrupted; > > @@ -35,25 +36,6 @@ handle_sigint (unused (int sig)) > interrupted = 1; > } > > -static char * > -_escape_tag (char *buf, const char *tag) > -{ > - const char *in = tag; > - char *out = buf; > - > - /* Boolean terms surrounded by double quotes can contain any > - * character. Double quotes are quoted by doubling them. */ > - *out++ = '"'; > - while (*in) { > - if (*in == '"') > - *out++ = '"'; > - *out++ = *in++; > - } > - *out++ = '"'; > - *out = 0; > - return buf; > -} > - > typedef struct { > const char *tag; > notmuch_bool_t remove; > @@ -71,25 +53,16 @@ _optimize_tag_query (void *ctx, const char *orig_query_string, > * parenthesize and the exclusion part of the query must not use > * the '-' operator (though the NOT operator is fine). */ > > - char *escaped, *query_string; > + char *escaped = NULL; > + size_t escaped_len = 0; > + char *query_string; > const char *join = ""; > - int i; > - unsigned int max_tag_len = 0; > + size_t i; > > /* Don't optimize if there are no tag changes. */ > if (tag_ops[0].tag == NULL) > return talloc_strdup (ctx, orig_query_string); > > - /* Allocate a buffer for escaping tags. This is large enough to > - * hold a fully escaped tag with every character doubled plus > - * enclosing quotes and a NUL. */ > - for (i = 0; tag_ops[i].tag; i++) > - if (strlen (tag_ops[i].tag) > max_tag_len) > - max_tag_len = strlen (tag_ops[i].tag); > - escaped = talloc_array (ctx, char, max_tag_len * 2 + 3); > - if (! escaped) > - return NULL; > - > /* Build the new query string */ > if (strcmp (orig_query_string, "*") == 0) > query_string = talloc_strdup (ctx, "("); > @@ -97,10 +70,17 @@ _optimize_tag_query (void *ctx, const char *orig_query_string, > query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string); > > for (i = 0; tag_ops[i].tag && query_string; i++) { > + /* XXX in case of OOM, query_string will be deallocated when > + * ctx is, which might be at shutdown */ > + if (make_boolean_term (ctx, > + "tag", tag_ops[i].tag, > + &escaped, &escaped_len)) > + return NULL; > + > query_string = talloc_asprintf_append_buffer ( > - query_string, "%s%stag:%s", join, > + query_string, "%s%s%s", join, > tag_ops[i].remove ? "" : "not ", > - _escape_tag (escaped, tag_ops[i].tag)); > + escaped); > join = " or "; > } > > diff --git a/util/string-util.c b/util/string-util.c > index 44f8cd3..e4bea21 100644 > --- a/util/string-util.c > +++ b/util/string-util.c > @@ -20,6 +20,7 @@ > > > #include "string-util.h" > +#include "talloc.h" > > char * > strtok_len (char *s, const char *delim, size_t *len) > @@ -32,3 +33,66 @@ strtok_len (char *s, const char *delim, size_t *len) > > return *len ? s : NULL; > } > + > +int > +make_boolean_term (void *ctx, const char *prefix, const char *term, > + char **buf, size_t *len) > +{ > + const char *in; > + char *out; > + size_t needed = 3; > + int need_quoting = 0; > + > + /* Do we need quoting? To be paranoid, we quote anything > + * containing a quote, even though it only matters at the > + * beginning, and anything containing non-ASCII text. */ > + for (in = term; *in && !need_quoting; in++) > + if (*in <= ' ' || *in == ')' || *in == '"' || (unsigned char)*in > 127) Should that be *in >= 127? Otherwise LGTM. Jani. > + need_quoting = 1; > + > + if (need_quoting) > + for (in = term; *in; in++) > + needed += (*in == '"') ? 2 : 1; > + else > + needed = strlen (term) + 1; > + > + /* Reserve space for the prefix */ > + if (prefix) > + needed += strlen (prefix) + 1; > + > + if ((*buf == NULL) || (needed > *len)) { > + *len = 2 * needed; > + *buf = talloc_realloc (ctx, *buf, char, *len); > + } > + > + if (! *buf) > + return 1; > + > + out = *buf; > + > + /* Copy in the prefix */ > + if (prefix) { > + strcpy (out, prefix); > + out += strlen (prefix); > + *out++ = ':'; > + } > + > + if (! need_quoting) { > + strcpy (out, term); > + return 0; > + } > + > + /* Quote term by enclosing it in double quotes and doubling any > + * internal double quotes. */ > + *out++ = '"'; > + in = term; > + while (*in) { > + if (*in == '"') > + *out++ = '"'; > + *out++ = *in++; > + } > + *out++ = '"'; > + *out = '\0'; > + > + return 0; > +} > diff --git a/util/string-util.h b/util/string-util.h > index ac7676c..b8844a3 100644 > --- a/util/string-util.h > +++ b/util/string-util.h > @@ -19,4 +19,18 @@ > > char *strtok_len (char *s, const char *delim, size_t *len); > > +/* Construct a boolean term query with the specified prefix (e.g., > + * "id") and search term, quoting term as necessary. Specifically, if > + * term contains any non-printable ASCII characters, non-ASCII > + * characters, close parenthesis or double quotes, it will be enclosed > + * in double quotes and any internal double quotes will be doubled > + * (e.g. a"b -> "a""b"). The result will be a valid notmuch query and > + * can be parsed by parse_boolean_term. > + * > + * Output is into buf; it may be talloc_realloced. > + * Return: 0 on success, non-zero on memory allocation failure. > + */ > +int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term, > + char **buf, size_t *len); > + > #endif > -- > 1.7.10.4