Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 40FAE431FBF for ; Sun, 6 Jan 2013 12:23:13 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3jVUUvFPj73g for ; Sun, 6 Jan 2013 12:23:12 -0800 (PST) Received: from dmz-mailsec-scanner-4.mit.edu (DMZ-MAILSEC-SCANNER-4.MIT.EDU [18.9.25.15]) by olra.theworths.org (Postfix) with ESMTP id 33D81431FAF for ; Sun, 6 Jan 2013 12:23:12 -0800 (PST) X-AuditID: 1209190f-b7f016d000000e07-8b-50e9dd2f154d Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) by dmz-mailsec-scanner-4.mit.edu (Symantec Messaging Gateway) with SMTP id 4A.C9.03591.F2DD9E05; Sun, 6 Jan 2013 15:23:11 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id r06KNAJM012719; Sun, 6 Jan 2013 15:23:10 -0500 Received: from drake.dyndns.org (a069.catapulsion.net [70.36.81.69]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id r06KMs19020343 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Sun, 6 Jan 2013 15:23:02 -0500 (EST) Received: from amthrax by drake.dyndns.org with local (Exim 4.77) (envelope-from ) id 1Trwk7-0007YF-G3; Sun, 06 Jan 2013 15:22:51 -0500 From: Austin Clements To: notmuch@notmuchmail.org Subject: [PATCH v5 3/6] util: Function to parse boolean term queries Date: Sun, 6 Jan 2013 15:22:39 -0500 Message-Id: <1357503762-28759-4-git-send-email-amdragon@mit.edu> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1357503762-28759-1-git-send-email-amdragon@mit.edu> References: <1357503762-28759-1-git-send-email-amdragon@mit.edu> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrBIsWRmVeSWpSXmKPExsUixG6nrqt/92WAwbGZShY3WrsZLZqmO1us nstjcf3mTGaLNyvnsTqweuycdZfd4/DXhSwet+6/Zvd4tuoWs8eWQ++ZA1ijuGxSUnMyy1KL 9O0SuDL2tQUX9MlVPHpyjLGB8ZpYFyMnh4SAicTyb/fZIGwxiQv31oPZQgL7GCXOTBPoYuQC stczSlz8u4wZwtnPJPHu+F5WCGcuo0TXvBYmkBY2AQ2JbfuXM4LYIgLSEjvvzgYrYhZoYpS4 tmku2FxhAReJ5sOL2EFsFgFViSNdE8DivAIOEnM7vjBC3KEo0f0MJM7BwSngKPFxnizESQ4S d3bcZ5/AyL+AkWEVo2xKbpVubmJmTnFqsm5xcmJeXmqRrolebmaJXmpK6SZGUABySvLvYPx2 UOkQowAHoxIP74WdLwKEWBPLiitzDzFKcjApifLuvvgyQIgvKT+lMiOxOCO+qDQntfgQowQH s5II775jQDnelMTKqtSifJiUNAeLkjjv1ZSb/kIC6YklqdmpqQWpRTBZGQ4OJQne+7eBGgWL UtNTK9Iyc0oQ0kwcnCDDeYCGvwSp4S0uSMwtzkyHyJ9iVJQS510JkhAASWSU5sH1whLEK0Zx oFeEebeCVPEAkwtc9yugwUxAg1MfPwcZXJKIkJJqYOTWC9hd397YNanbTbLPY7aOqtKlYo0o yYdZOprFTvotG3KPzV+a95VJI/QOu2q7q8i5nuMsH76fVgktndS+dk7mqT7GaC/lYynl+3Mv Lb27Um7JX43QmI7QBp37HxVfN1UVHNmvzD6baZLl75QfdwJzJOInvb87c+Utrl2T5duX3C/5 99F1kRJLcUaioRZzUXEiAAmuOkDrAgAA Cc: tomi.ollila@iki.fi X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2013 20:23:13 -0000 This parses the subset of Xapian's boolean term quoting rules that are used by make_boolean_term. This is provided as a generic string utility, but will be used shortly in notmuch restore to parse and optimize for ID queries. --- util/string-util.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++ util/string-util.h | 16 ++++++++++ 2 files changed, 98 insertions(+) diff --git a/util/string-util.c b/util/string-util.c index 7a71049..aba9aa8 100644 --- a/util/string-util.c +++ b/util/string-util.c @@ -22,6 +22,7 @@ #include "string-util.h" #include "talloc.h" +#include #include char * @@ -107,3 +108,84 @@ make_boolean_term (void *ctx, const char *prefix, const char *term, return 0; } + +static const char* +skip_space (const char *str) +{ + while (*str && isspace ((unsigned char) *str)) + ++str; + return str; +} + +int +parse_boolean_term (void *ctx, const char *str, + char **prefix_out, char **term_out) +{ + int err = EINVAL; + *prefix_out = *term_out = NULL; + + /* Parse prefix */ + str = skip_space (str); + const char *pos = strchr (str, ':'); + if (! pos) + goto FAIL; + *prefix_out = talloc_strndup (ctx, str, pos - str); + if (! *prefix_out) { + err = ENOMEM; + goto FAIL; + } + ++pos; + + /* Implement de-quoting compatible with make_boolean_term. */ + if (*pos == '"') { + char *out = talloc_array (ctx, char, strlen (pos)); + int closed = 0; + if (! out) { + err = ENOMEM; + goto FAIL; + } + *term_out = out; + /* Skip the opening quote, find the closing quote, and + * un-double doubled internal quotes. */ + for (++pos; *pos; ) { + if (*pos == '"') { + ++pos; + if (*pos != '"') { + /* Found the closing quote. */ + closed = 1; + pos = skip_space (pos); + break; + } + } + *out++ = *pos++; + } + /* Did the term terminate without a closing quote or is there + * trailing text after the closing quote? */ + if (!closed || *pos) + goto FAIL; + *out = '\0'; + } else { + const char *start = pos; + /* Check for text after the boolean term. */ + while (! is_unquoted_terminator (*pos)) + ++pos; + if (*skip_space (pos)) { + err = EINVAL; + goto FAIL; + } + /* No trailing text; dup the string so the caller can free + * it. */ + *term_out = talloc_strndup (ctx, start, pos - start); + if (! *term_out) { + err = ENOMEM; + goto FAIL; + } + } + return 0; + + FAIL: + talloc_free (*prefix_out); + talloc_free (*term_out); + errno = err; + return -1; +} diff --git a/util/string-util.h b/util/string-util.h index 719c276..0194607 100644 --- a/util/string-util.h +++ b/util/string-util.h @@ -34,4 +34,20 @@ char *strtok_len (char *s, const char *delim, size_t *len); int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term, char **buf, size_t *len); +/* Parse a boolean term query consisting of a prefix, a colon, and a + * term that may be quoted as described for make_boolean_term. If the + * term is not quoted, then it ends at the first whitespace or close + * parenthesis. str may containing leading or trailing whitespace, + * but anything else is considered a parse error. This is compatible + * with anything produced by make_boolean_term, and supports a subset + * of the quoting styles supported by Xapian (and hence notmuch). + * *prefix_out and *term_out will be talloc'd with context ctx. + * + * Return: 0 on success, -1 on error. errno will be set to EINVAL if + * there is a parse error or ENOMEM if there is an allocation failure. + */ +int +parse_boolean_term (void *ctx, const char *str, + char **prefix_out, char **term_out); + #endif -- 1.7.10.4