Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 8032A431FB6 for ; Sun, 30 Dec 2012 21:52:24 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id V-44hOknVuAK for ; Sun, 30 Dec 2012 21:52:23 -0800 (PST) Received: from dmz-mailsec-scanner-7.mit.edu (DMZ-MAILSEC-SCANNER-7.MIT.EDU [18.7.68.36]) by olra.theworths.org (Postfix) with ESMTP id 1836E431FAF for ; Sun, 30 Dec 2012 21:52:23 -0800 (PST) X-AuditID: 12074424-b7f4e6d0000004ca-7e-50e128149662 Received: from mailhub-auth-1.mit.edu ( [18.9.21.35]) by dmz-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP id 3C.80.01226.41821E05; Mon, 31 Dec 2012 00:52:20 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id qBV5qKr4025387; Mon, 31 Dec 2012 00:52:20 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id qBV5qHf6006487 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Mon, 31 Dec 2012 00:52:19 -0500 (EST) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80) (envelope-from ) id 1TpYIL-0007eU-Ay; Mon, 31 Dec 2012 00:52:17 -0500 From: Austin Clements To: Tomi Ollila , notmuch@notmuchmail.org Subject: Re: [PATCH v3 2/5] util: Function to parse boolean term queries In-Reply-To: References: <1356719189-2837-1-git-send-email-amdragon@mit.edu> <1356719189-2837-3-git-send-email-amdragon@mit.edu> User-Agent: Notmuch/0.14+236~gf64406d (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu) Date: Mon, 31 Dec 2012 00:52:17 -0500 Message-ID: <87a9sulqem.fsf@awakening.csail.mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrAIsWRmVeSWpSXmKPExsUixCmqrCui8TDAYP9Ma4vrN2cyW7xZOY/V gcnj8NeFLB7PVt1iDmCK4rJJSc3JLEst0rdL4Mo49ymmYLdixctfr5gbGA9IdjFycEgImEhs OVzWxcgJZIpJXLi3ng3EFhLYxyjR+IGzi5ELyN7AKLHw0QMmCOcik8SiyQ0sEM4SRompvcvZ QVrYBDQktu1fzghiiwjYSlxZcpIVxBYW8JDYuPEsE4jNKWAg8ej0U3aI5umMEi1rHoIlRAXi JZ7f+8YCYrMIqErMXb6ZGcTmBTpv3b1J7BC2oMTJmU/AapgFtCRu/HvJNIFRYBaS1CwkqQWM TKsYZVNyq3RzEzNzilOTdYuTE/PyUot0zfVyM0v0UlNKNzGCApLdRWUHY/MhpUOMAhyMSjy8 G/4/CBBiTSwrrsw9xCjJwaQkyquu/DBAiC8pP6UyI7E4I76oNCe1+BCjBAezkgjv0XlA5bwp iZVVqUX5MClpDhYlcd7rKTf9hQTSE0tSs1NTC1KLYLIyHBxKErxa6kBDBYtS01Mr0jJzShDS TBycIMN5gIabgtTwFhck5hZnpkPkTzEqSonzioEkBEASGaV5cL2whPGKURzoFWFeIZAqHmCy get+BTSYCWiwFgPI1cUliQgpqQZGj/5foq4lzgzHljXNUprd0utrdLRePXbiI84oibNxsSkt j5+8emDRt/+Nnc8jWZkzrNUxbZ+zCvQTVob9nidut2jv4iMzN3PuzpfVsp/y5OKit00WV/5w XFlz/n4lo23u7UnzfNes+dL3qtXzghLDhV/VTLMjKsuCGNU8jt7d6exqPGfxvVg7JZbijERD Leai4kQA9lwoOfMCAAA= X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Dec 2012 05:52:24 -0000 On Sat, 29 Dec 2012, Tomi Ollila wrote: > On Fri, Dec 28 2012, Austin Clements wrote: > >> This parses the subset of Xapian's boolean term quoting rules that are >> used by make_boolean_term. This is provided as a generic string >> utility, but will be used shortly in notmuch restore to parse and >> optimize for ID queries. >> --- >> util/string-util.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> util/string-util.h | 11 +++++++++++ >> 2 files changed, 66 insertions(+) >> >> diff --git a/util/string-util.c b/util/string-util.c >> index e4bea21..83b4953 100644 >> --- a/util/string-util.c >> +++ b/util/string-util.c >> @@ -96,3 +96,58 @@ make_boolean_term (void *ctx, const char *prefix, const char *term, >> >> return 0; >> } >> + >> +int >> +parse_boolean_term (void *ctx, const char *str, >> + char **prefix_out, char **term_out) >> +{ >> + *prefix_out = *term_out = NULL; >> + >> + /* Parse prefix */ >> + const char *pos = strchr (str, ':'); >> + if (! pos) >> + goto FAIL; >> + *prefix_out = talloc_strndup (ctx, str, pos - str); >> + ++pos; >> + >> + /* Implement de-quoting compatible with make_boolean_term. */ >> + if (*pos == '"') { >> + char *out = talloc_array (ctx, char, strlen (pos)); >> + int closed = 0; >> + *term_out = out; >> + /* Skip the opening quote, find the closing quote, and >> + * un-double doubled internal quotes. */ >> + for (++pos; *pos; ) { >> + if (*pos == '"') { >> + ++pos; >> + if (*pos != '"') { >> + /* Found the closing quote. */ >> + closed = 1; >> + break; >> + } >> + } >> + *out++ = *pos++; >> + } >> + /* Did the term terminate without a closing quote or is there >> + * trailing text after the closing quote? */ >> + if (!closed || *pos) >> + goto FAIL; >> + *out = '\0'; >> + } else { >> + const char *start = pos; >> + /* Check for text after the boolean term. */ >> + while (*pos > ' ' && *pos != ')') >> + ++pos; >> + if (*pos) >> + goto FAIL; > > Mark pointed out a good case about trailing whitespace -- It would be nice > if the core were lenient for such cases. I personally remember once wasting > hours of work by just failing to notice trailing whitespace in one system > so this subject is sensitive to me... Will do. > Another thing I saw earlyer today: make_boolean_term() checks > > if (*in <= ' ' || *in == ')' || *in == '"' || (unsigned char)*in > 127) > > but here the check is only > > while (*pos > ' ' && *pos != ')') > > I wonder whether it matters... This is correct in a conservative way. We quote things that we don't strictly need to quote (for example, a double quote in the middle of a term doesn't actually require the term to be quoted, but we do anyway), but we look for the end of an unquoted term in exactly the same way Xapian does. > Everyting else looks good to me. > > > Tomi > >> + /* No trailing text; dup the string so the caller can free >> + * it. */ >> + *term_out = talloc_strdup (ctx, start); >> + } >> + return 0; >> + >> + FAIL: >> + talloc_free (*prefix_out); >> + talloc_free (*term_out); >> + return 1; >> +} >> diff --git a/util/string-util.h b/util/string-util.h >> index b8844a3..43d49d0 100644 >> --- a/util/string-util.h >> +++ b/util/string-util.h >> @@ -33,4 +33,15 @@ char *strtok_len (char *s, const char *delim, size_t *len); >> int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term, >> char **buf, size_t *len); >> >> +/* Parse a boolean term query produced by make_boolean_term, returning >> + * the prefix in *prefix_out and the term in *term_out. *prefix_out >> + * and *term_out will be talloc'd with context ctx. >> + * >> + * Return: 0 on success, non-zero on parse error (including trailing >> + * data in str). >> + */ >> +int >> +parse_boolean_term (void *ctx, const char *str, >> + char **prefix_out, char **term_out); >> + >> #endif >> -- >> 1.7.10.4 >> >> _______________________________________________ >> notmuch mailing list >> notmuch@notmuchmail.org >> http://notmuchmail.org/mailman/listinfo/notmuch