--- /dev/null
+Return-Path: <amdragon@mit.edu>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by olra.theworths.org (Postfix) with ESMTP id 21291431FAF\r
+ for <notmuch@notmuchmail.org>; Thu, 3 Jan 2013 23:27:04 -0800 (PST)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: 1.151\r
+X-Spam-Level: *\r
+X-Spam-Status: No, score=1.151 tagged_above=-999 required=5\r
+ tests=[FUZZY_AMBIEN=1.851, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+ by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id NSsibVpNViL2 for <notmuch@notmuchmail.org>;\r
+ Thu, 3 Jan 2013 23:27:02 -0800 (PST)\r
+Received: from dmz-mailsec-scanner-8.mit.edu (DMZ-MAILSEC-SCANNER-8.MIT.EDU\r
+ [18.7.68.37])\r
+ by olra.theworths.org (Postfix) with ESMTP id 07464431FAE\r
+ for <notmuch@notmuchmail.org>; Thu, 3 Jan 2013 23:27:01 -0800 (PST)\r
+X-AuditID: 12074425-b7ff26d000007f8d-66-50e6844567e2\r
+Received: from mailhub-auth-1.mit.edu ( [18.9.21.35])\r
+ by dmz-mailsec-scanner-8.mit.edu (Symantec Messaging Gateway) with SMTP\r
+ id BD.1F.32653.54486E05; Fri, 4 Jan 2013 02:27:01 -0500 (EST)\r
+Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103])\r
+ by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id r047R045017065; \r
+ Fri, 4 Jan 2013 02:27:00 -0500\r
+Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91])\r
+ (authenticated bits=0)\r
+ (User authenticated as amdragon@ATHENA.MIT.EDU)\r
+ by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id r047Qsxe000343\r
+ (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT);\r
+ Fri, 4 Jan 2013 02:26:55 -0500 (EST)\r
+Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80)\r
+ (envelope-from <amdragon@MIT.EDU>)\r
+ id 1Tr1g6-0001lT-2H; Fri, 04 Jan 2013 02:26:54 -0500\r
+Date: Fri, 4 Jan 2013 02:26:53 -0500\r
+From: Austin Clements <amdragon@MIT.EDU>\r
+To: Jani Nikula <jani@nikula.org>\r
+Subject: Re: [PATCH v4 1/5] util: Factor out boolean term quoting routine\r
+Message-ID: <20130104072653.GJ17581@mit.edu>\r
+References: <1356936162-2589-1-git-send-email-amdragon@mit.edu>\r
+ <1356936162-2589-2-git-send-email-amdragon@mit.edu>\r
+ <87y5gagqkp.fsf@nikula.org>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain; charset=us-ascii\r
+Content-Disposition: inline\r
+In-Reply-To: <87y5gagqkp.fsf@nikula.org>\r
+User-Agent: Mutt/1.5.21 (2010-09-15)\r
+X-Brightmail-Tracker:\r
+ H4sIAAAAAAAAA+NgFlrLKsWRmVeSWpSXmKPExsUixCmqrOva8izA4Hy3vMWN1m5Gi6bpzhar\r
+ 5/JYXL85k9nizcp5rA6sHjtn3WX3OPx1IYvHrfuv2T2erbrF7LHl0HvmANYoLpuU1JzMstQi\r
+ fbsEroz596ewFjy3r5g8r4G1gbFNr4uRk0NCwESi610vG4QtJnHh3nogm4tDSGAfo0Tj74WM\r
+ EM56Rom+h31QzgUmiQvvV7JCOEsYJV707mcG6WcRUJGY/XAdK4jNJqAhsW3/ckYQW0RAUWLz\r
+ yf1gNrNAmsT331PAaoQFPCWun9oNZvMK6Ehc3XMLavdkRokrvacZIRKCEidnPmGBaNaSuPHv\r
+ JVMXIweQLS2x/B8HSJgTaFf38tVgc0SBbphychvbBEahWUi6ZyHpnoXQvYCReRWjbEpulW5u\r
+ YmZOcWqybnFyYl5eapGuhV5uZoleakrpJkZQTLC7qO5gnHBI6RCjAAejEg/vhHtPA4RYE8uK\r
+ K3MPMUpyMCmJ8ro3PQsQ4kvKT6nMSCzOiC8qzUktPsQowcGsJML7WRsox5uSWFmVWpQPk5Lm\r
+ YFES572RctNfSCA9sSQ1OzW1ILUIJivDwaEkwXsJZKhgUWp6akVaZk4JQpqJgxNkOA/Q8Ncg\r
+ NbzFBYm5xZnpEPlTjLocDS9vPGUUYsnLz0uVEuc9CVIkAFKUUZoHNweWyl4xigO9Jcz7FqSK\r
+ B5gG4Sa9AlrCBLTk1ZvHIEtKEhFSUg2M2arBVcoWQlekOae93XRn7QY5vt9LbS7uyBGZ49D6\r
+ 4lQHw/Hrn/64vd96N/5C0ZPuSQ4hU+Jz5jrWn3w+z/f1C7uZDx9ozmNoTZjxd9kT+WydD7nv\r
+ Z6dKlL9UEFbk6/h/WyBr5sttvfHnrR4t57jIOCu/c629kfi7Myc2/XJW3Cgr+veQ7tn6b0os\r
+ xRmJhlrMRcWJAN8QOmlAAwAA\r
+Cc: tomi.ollila@iki.fi, notmuch@notmuchmail.org\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Fri, 04 Jan 2013 07:27:04 -0000\r
+\r
+Quoth Jani Nikula on Jan 03 at 5:48 pm:\r
+> On Mon, 31 Dec 2012, Austin Clements <amdragon@MIT.EDU> wrote:\r
+> > From: Austin Clements <amdragon@MIT.EDU>\r
+> >\r
+> > This is now a generic boolean term quoting function. It performs\r
+> > minimal quoting to produce user-friendly queries.\r
+> >\r
+> > This could live in tag-util as well, but it is really nothing specific\r
+> > to tags (although the conventions are specific to Xapian).\r
+> >\r
+> > The API is changed from "caller-allocates" to "readline-like". The\r
+> > scan for max tag length is pushed down into the quoting routine.\r
+> > Furthermore, this now combines the term prefix with the quoted term;\r
+> > arguably this is just as easy to do in the caller, but this will\r
+> > nicely parallel the boolean term parsing function to be introduced\r
+> > shortly.\r
+> >\r
+> > This is an amalgamation of code written by David Bremner and myself.\r
+> > ---\r
+> > notmuch-tag.c | 48 ++++++++++++---------------------------\r
+> > util/string-util.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++\r
+> > util/string-util.h | 14 ++++++++++++\r
+> > 3 files changed, 92 insertions(+), 34 deletions(-)\r
+> >\r
+> > diff --git a/notmuch-tag.c b/notmuch-tag.c\r
+> > index 88d559b..fc9d43a 100644\r
+> > --- a/notmuch-tag.c\r
+> > +++ b/notmuch-tag.c\r
+> > @@ -19,6 +19,7 @@\r
+> > */\r
+> > \r
+> > #include "notmuch-client.h"\r
+> > +#include "string-util.h"\r
+> > \r
+> > static volatile sig_atomic_t interrupted;\r
+> > \r
+> > @@ -35,25 +36,6 @@ handle_sigint (unused (int sig))\r
+> > interrupted = 1;\r
+> > }\r
+> > \r
+> > -static char *\r
+> > -_escape_tag (char *buf, const char *tag)\r
+> > -{\r
+> > - const char *in = tag;\r
+> > - char *out = buf;\r
+> > -\r
+> > - /* Boolean terms surrounded by double quotes can contain any\r
+> > - * character. Double quotes are quoted by doubling them. */\r
+> > - *out++ = '"';\r
+> > - while (*in) {\r
+> > - if (*in == '"')\r
+> > - *out++ = '"';\r
+> > - *out++ = *in++;\r
+> > - }\r
+> > - *out++ = '"';\r
+> > - *out = 0;\r
+> > - return buf;\r
+> > -}\r
+> > -\r
+> > typedef struct {\r
+> > const char *tag;\r
+> > notmuch_bool_t remove;\r
+> > @@ -71,25 +53,16 @@ _optimize_tag_query (void *ctx, const char *orig_query_string,\r
+> > * parenthesize and the exclusion part of the query must not use\r
+> > * the '-' operator (though the NOT operator is fine). */\r
+> > \r
+> > - char *escaped, *query_string;\r
+> > + char *escaped = NULL;\r
+> > + size_t escaped_len = 0;\r
+> > + char *query_string;\r
+> > const char *join = "";\r
+> > - int i;\r
+> > - unsigned int max_tag_len = 0;\r
+> > + size_t i;\r
+> > \r
+> > /* Don't optimize if there are no tag changes. */\r
+> > if (tag_ops[0].tag == NULL)\r
+> > return talloc_strdup (ctx, orig_query_string);\r
+> > \r
+> > - /* Allocate a buffer for escaping tags. This is large enough to\r
+> > - * hold a fully escaped tag with every character doubled plus\r
+> > - * enclosing quotes and a NUL. */\r
+> > - for (i = 0; tag_ops[i].tag; i++)\r
+> > - if (strlen (tag_ops[i].tag) > max_tag_len)\r
+> > - max_tag_len = strlen (tag_ops[i].tag);\r
+> > - escaped = talloc_array (ctx, char, max_tag_len * 2 + 3);\r
+> > - if (! escaped)\r
+> > - return NULL;\r
+> > -\r
+> > /* Build the new query string */\r
+> > if (strcmp (orig_query_string, "*") == 0)\r
+> > query_string = talloc_strdup (ctx, "(");\r
+> > @@ -97,10 +70,17 @@ _optimize_tag_query (void *ctx, const char *orig_query_string,\r
+> > query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string);\r
+> > \r
+> > for (i = 0; tag_ops[i].tag && query_string; i++) {\r
+> > + /* XXX in case of OOM, query_string will be deallocated when\r
+> > + * ctx is, which might be at shutdown */\r
+> > + if (make_boolean_term (ctx,\r
+> > + "tag", tag_ops[i].tag,\r
+> > + &escaped, &escaped_len))\r
+> > + return NULL;\r
+> > +\r
+> > query_string = talloc_asprintf_append_buffer (\r
+> > - query_string, "%s%stag:%s", join,\r
+> > + query_string, "%s%s%s", join,\r
+> > tag_ops[i].remove ? "" : "not ",\r
+> > - _escape_tag (escaped, tag_ops[i].tag));\r
+> > + escaped);\r
+> > join = " or ";\r
+> > }\r
+> > \r
+> > diff --git a/util/string-util.c b/util/string-util.c\r
+> > index 44f8cd3..e4bea21 100644\r
+> > --- a/util/string-util.c\r
+> > +++ b/util/string-util.c\r
+> > @@ -20,6 +20,7 @@\r
+> > \r
+> > \r
+> > #include "string-util.h"\r
+> > +#include "talloc.h"\r
+> > \r
+> > char *\r
+> > strtok_len (char *s, const char *delim, size_t *len)\r
+> > @@ -32,3 +33,66 @@ strtok_len (char *s, const char *delim, size_t *len)\r
+> > \r
+> > return *len ? s : NULL;\r
+> > }\r
+> > +\r
+> > +int\r
+> > +make_boolean_term (void *ctx, const char *prefix, const char *term,\r
+> > + char **buf, size_t *len)\r
+> > +{\r
+> > + const char *in;\r
+> > + char *out;\r
+> > + size_t needed = 3;\r
+> > + int need_quoting = 0;\r
+> > +\r
+> > + /* Do we need quoting? To be paranoid, we quote anything\r
+> > + * containing a quote, even though it only matters at the\r
+> > + * beginning, and anything containing non-ASCII text. */\r
+> > + for (in = term; *in && !need_quoting; in++)\r
+> > + if (*in <= ' ' || *in == ')' || *in == '"' || (unsigned char)*in > 127)\r
+> \r
+> Should that be *in >= 127?\r
+\r
+Nope. Character 127 is fine (and ASCII). Technically the only\r
+non-ASCII characters that require quoting are 0x201c and 0x201d, but\r
+rather than decoding UTF-8 to find those characters, it's much easier\r
+to just quote if there are any non-ASCII UTF-8 bytes. (Extra\r
+technically, we would be in real trouble if a tag contained 8-bit\r
+bytes but wasn't valid UTF-8; however, I think this would be the least\r
+of our worries.)\r
+\r
+> Otherwise LGTM.\r
+> \r
+> Jani.\r
+> \r
+> > + need_quoting = 1;\r
+> > +\r
+> > + if (need_quoting)\r
+> > + for (in = term; *in; in++)\r
+> > + needed += (*in == '"') ? 2 : 1;\r
+> > + else\r
+> > + needed = strlen (term) + 1;\r
+> > +\r
+> > + /* Reserve space for the prefix */\r
+> > + if (prefix)\r
+> > + needed += strlen (prefix) + 1;\r
+> > +\r
+> > + if ((*buf == NULL) || (needed > *len)) {\r
+> > + *len = 2 * needed;\r
+> > + *buf = talloc_realloc (ctx, *buf, char, *len);\r
+> > + }\r
+> > +\r
+> > + if (! *buf)\r
+> > + return 1;\r
+> > +\r
+> > + out = *buf;\r
+> > +\r
+> > + /* Copy in the prefix */\r
+> > + if (prefix) {\r
+> > + strcpy (out, prefix);\r
+> > + out += strlen (prefix);\r
+> > + *out++ = ':';\r
+> > + }\r
+> > +\r
+> > + if (! need_quoting) {\r
+> > + strcpy (out, term);\r
+> > + return 0;\r
+> > + }\r
+> > +\r
+> > + /* Quote term by enclosing it in double quotes and doubling any\r
+> > + * internal double quotes. */\r
+> > + *out++ = '"';\r
+> > + in = term;\r
+> > + while (*in) {\r
+> > + if (*in == '"')\r
+> > + *out++ = '"';\r
+> > + *out++ = *in++;\r
+> > + }\r
+> > + *out++ = '"';\r
+> > + *out = '\0';\r
+> > +\r
+> > + return 0;\r
+> > +}\r
+> > diff --git a/util/string-util.h b/util/string-util.h\r
+> > index ac7676c..b8844a3 100644\r
+> > --- a/util/string-util.h\r
+> > +++ b/util/string-util.h\r
+> > @@ -19,4 +19,18 @@\r
+> > \r
+> > char *strtok_len (char *s, const char *delim, size_t *len);\r
+> > \r
+> > +/* Construct a boolean term query with the specified prefix (e.g.,\r
+> > + * "id") and search term, quoting term as necessary. Specifically, if\r
+> > + * term contains any non-printable ASCII characters, non-ASCII\r
+> > + * characters, close parenthesis or double quotes, it will be enclosed\r
+> > + * in double quotes and any internal double quotes will be doubled\r
+> > + * (e.g. a"b -> "a""b"). The result will be a valid notmuch query and\r
+> > + * can be parsed by parse_boolean_term.\r
+> > + *\r
+> > + * Output is into buf; it may be talloc_realloced.\r
+> > + * Return: 0 on success, non-zero on memory allocation failure.\r
+> > + */\r
+> > +int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term,\r
+> > + char **buf, size_t *len);\r
+> > +\r
+> > #endif\r
+\r
+-- \r
+Austin Clements MIT/'06/PhD/CSAIL\r
+amdragon@mit.edu http://web.mit.edu/amdragon\r
+ Somewhere in the dream we call reality you will find me,\r
+ searching for the reality we call dreams.\r