v9 of batch tagging

author david <david@tethera.net>

Mon, 24 Dec 2012 01:39:26 +0000 (21:39 +2000)

committer W. Trevor King <wking@tremily.us>

Fri, 7 Nov 2014 17:52:38 +0000 (09:52 -0800)
author david <david@tethera.net>
Mon, 24 Dec 2012 01:39:26 +0000 (21:39 +2000)
committer W. Trevor King <wking@tremily.us>
Fri, 7 Nov 2014 17:52:38 +0000 (09:52 -0800)
diff --git a/aa/91b34bc40b6b1abad68dca52c423e8e85e7c95 b/aa/91b34bc40b6b1abad68dca52c423e8e85e7c95

new file mode 100644 (file)

index 0000000..9858d34
--- /dev/null
+++ b/aa/91b34bc40b6b1abad68dca52c423e8e85e7c95
@@ -0,0 +1,215 @@
+Return-Path: <bremner@tethera.net>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+       by olra.theworths.org (Postfix) with ESMTP id 53235431FBF\r
+       for <notmuch@notmuchmail.org>; Sun, 23 Dec 2012 17:40:14 -0800 (PST)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: 0\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none]\r
+       autolearn=disabled\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+       by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+       with ESMTP id KrKXEzMdKkt9 for <notmuch@notmuchmail.org>;\r
+       Sun, 23 Dec 2012 17:40:12 -0800 (PST)\r
+Received: from tesseract.cs.unb.ca (tesseract.cs.unb.ca [131.202.240.238])\r
+       (using TLSv1 with cipher AES256-SHA (256/256 bits))\r
+       (No client certificate requested)\r
+       by olra.theworths.org (Postfix) with ESMTPS id 919E0431FCB\r
+       for <notmuch@notmuchmail.org>; Sun, 23 Dec 2012 17:40:02 -0800 (PST)\r
+Received: from fctnnbsc30w-156034082078.dhcp-dynamic.fibreop.nb.bellaliant.net\r
+       ([156.34.82.78] helo=zancas.localnet)\r
+       by tesseract.cs.unb.ca with esmtpsa\r
+       (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72)\r
+       (envelope-from <bremner@tethera.net>) id 1Tmx1K-0008Kj-SZ\r
+       for notmuch@notmuchmail.org; Sun, 23 Dec 2012 21:40:01 -0400\r
+Received: from bremner by zancas.localnet with local (Exim 4.80)\r
+       (envelope-from <bremner@tethera.net>) id 1Tmx1F-0002nD-C6\r
+       for notmuch@notmuchmail.org; Sun, 23 Dec 2012 21:39:53 -0400\r
+From: david@tethera.net\r
+To: notmuch@notmuchmail.org\r
+Subject: v9 of batch tagging\r
+Date: Sun, 23 Dec 2012 21:39:26 -0400\r
+Message-Id: <1356313183-9266-1-git-send-email-david@tethera.net>\r
+X-Mailer: git-send-email 1.7.10.4\r
+X-Spam_bar: -\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+       <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+       <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+       <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Mon, 24 Dec 2012 01:40:14 -0000\r
+\r
+This obsoletes \r
+\r
+     id:1356095307-22895-1-git-send-email-david@tethera.net\r
+\r
+The main changes since v8 are the rebasing against the notmuch-restore\r
+fixes in master, and the rewrite of the query (pre)-processing\r
+unhex_and_quote. This incorporates the changes of\r
+\r
+      id:1356231570-28232-1-git-send-email-david@tethera.net\r
+\r
+and  now handles '()'  (cf. id:87a9t5p4dz.fsf@qmul.ac.uk)\r
+\r
+With respect to \r
+\r
+,----\r
+| Finally, I don't know if a query can contain a : without being a\r
+| prefix query. If it can that could end up being misquoted.\r
+`----\r
+\r
+This is pretty easy to work around by encoding that :. I think unless\r
+it is a problem in practice I prefer not to keep an explicity list of\r
+prefixes here; recognizing prefixes should really be a service from\r
+libnotmuch.\r
+\r
+I dropped two patches (strnspn and hex_invariant), but picked up a new\r
+strtok variation. Probably the name strtok_len2 could be improved\r
+(and I see there is a typo in the patch subject).\r
+\r
+ [Patch v9 05/17] util/string-util: add a new string tokenized\r
+\r
+Finally I added a test for the new parenthesis handling.\r
+\r
+[Patch v9 17/17] test/tagging: add test for handling of parens\r
+\r
+Fixup wise, the tests needed to be adjusted a bit for () being delimiters, \r
+and the man page as well.\r
+\r
+I added the fclose in id:87wqw9hf9a.fsf@oiva.home.nikula.org\r
+\r
+And I modified the return value per id:87zk15hi7f.fsf@oiva.home.nikula.org\r
+\r
+Here is the interdiff for unhex_and_quote:\r
+\r
+commit 67c6aee87db5c7da25529e1c0feb64e422abb4b7\r
+Author: David Bremner <bremner@unb.ca>\r
+Date:   Sat Dec 22 22:49:02 2012 -0400\r
+\r
+    simplify unhex_and_quote, support parens\r
+    \r
+    the overgeneral definition of a prefix can be replaced by lower case\r
+    alphabetic, and still work fine with current notmuch query syntax.\r
+    \r
+    use () as delimiters in unhex_and_quote, preserve delimiters\r
+\r
+diff --git a/tag-util.c b/tag-util.c\r
+index 6f62fe6..91f3603 100644\r
+--- a/tag-util.c\r
++++ b/tag-util.c\r
+@@ -56,6 +56,21 @@ illegal_tag (const char *tag, notmuch_bool_t remove)\r
+     return NULL;\r
+ }\r
+ \r
++/* Factor out the boilerplate to append a token to the query string.\r
++ * For use in unhex_and_quote */\r
++\r
++static tag_parse_status_t\r
++append_tok (const char *tok, size_t tok_len,\r
++          const char *line_for_error, char **query_string)\r
++{\r
++\r
++    *query_string = talloc_strndup_append_buffer (*query_string, tok, tok_len);\r
++    if (*query_string == NULL)\r
++      return line_error (TAG_PARSE_OUT_OF_MEMORY, line_for_error, "aborting");\r
++\r
++    return TAG_PARSE_SUCCESS;\r
++}\r
++\r
+ /* Input is a hex encoded string, presumed to be a query for Xapian.\r
+  *\r
+  * Space delimited tokens are decoded and quoted, with '*' and prefixes\r
+@@ -67,45 +82,41 @@ unhex_and_quote (void *ctx, char *encoded, const char *line_for_error,\r
+ {\r
+     char *tok = encoded;\r
+     size_t tok_len = 0;\r
++    size_t delim_len = 0;\r
+     char *buf = NULL;\r
+     size_t buf_len = 0;\r
+     tag_parse_status_t ret = TAG_PARSE_SUCCESS;\r
+ \r
+     *query_string = talloc_strdup (ctx, "");\r
+ \r
+-    while ((tok = strtok_len (tok + tok_len, " ", &tok_len)) != NULL) {\r
++    while ((tok = strtok_len2 (tok + tok_len + delim_len, " ()",\r
++                             &tok_len, &delim_len)) != NULL) {\r
+ \r
+       size_t prefix_len;\r
+       char delim = *(tok + tok_len);\r
+ \r
+-      *(tok + tok_len++) = '\0';\r
++      *(tok + tok_len) = '\0';\r
+ \r
+-      prefix_len = hex_invariant (tok, tok_len);\r
++      /* The following matches a superset of prefixes currently\r
++       * used by notmuch */\r
++      prefix_len = strspn (tok, "abcdefghijklmnopqrstuvwxyz");\r
+ \r
+-      if ((strcmp (tok, "*") == 0) || prefix_len >= tok_len - 1) {\r
++      if ((strcmp (tok, "*") == 0) || prefix_len == tok_len) {\r
+ \r
+           /* pass some things through without quoting or decoding.\r
+            * Note for '*' this is mandatory.\r
+            */\r
+ \r
+-          if (! (*query_string = talloc_asprintf_append_buffer (\r
+-                     *query_string, "%s%c", tok, delim))) {\r
+-\r
+-              ret = line_error (TAG_PARSE_OUT_OF_MEMORY,\r
+-                                line_for_error, "aborting");\r
+-              goto DONE;\r
+-          }\r
++          ret = append_tok (tok, tok_len, line_for_error, query_string);\r
++          if (ret) goto DONE;\r
+ \r
+       } else {\r
+           /* potential prefix: one for ':', then something after */\r
+-          if ((tok_len - prefix_len > 2) && *(tok + prefix_len) == ':') {\r
+-              if (! (*query_string = talloc_strndup_append (*query_string,\r
+-                                                            tok,\r
+-                                                            prefix_len + 1))) {\r
+-                  ret = line_error (TAG_PARSE_OUT_OF_MEMORY,\r
+-                                    line_for_error, "aborting");\r
+-                  goto DONE;\r
+-              }\r
++          if ((tok_len - prefix_len >= 2) && *(tok + prefix_len) == ':') {\r
++              ret = append_tok (tok, prefix_len + 1,\r
++                                line_for_error, query_string);\r
++              if (ret) goto DONE;\r
++\r
+               tok += prefix_len + 1;\r
+               tok_len -= prefix_len + 1;\r
+           }\r
+@@ -122,13 +133,15 @@ unhex_and_quote (void *ctx, char *encoded, const char *line_for_error,\r
+               goto DONE;\r
+           }\r
+ \r
+-          if (! (*query_string = talloc_asprintf_append_buffer (\r
+-                     *query_string, "%s%c", buf, delim))) {\r
+-              ret = line_error (TAG_PARSE_OUT_OF_MEMORY,\r
+-                                line_for_error, "aborting");\r
+-              goto DONE;\r
+-          }\r
++          ret = append_tok (buf, buf_len, line_for_error, query_string);\r
++          if (ret) goto DONE;\r
+       }\r
++      /* restore the string */\r
++      *(tok + tok_len) = delim;\r
++\r
++      /* copy any delimiters */\r
++      ret = append_tok (tok + tok_len, delim_len, line_for_error, query_string);\r
++      if (ret) goto DONE;\r
+     }\r
+ \r
+   DONE:\r
+\r
author	david <david@tethera.net>
	Mon, 24 Dec 2012 01:39:26 +0000 (21:39 +2000)
committer	W. Trevor King <wking@tremily.us>
	Fri, 7 Nov 2014 17:52:38 +0000 (09:52 -0800)