Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id E5B50431FD0 for ; Wed, 9 Nov 2011 05:42:21 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8p5g4EQY-KfC for ; Wed, 9 Nov 2011 05:42:18 -0800 (PST) Received: from dmz-mailsec-scanner-2.mit.edu (DMZ-MAILSEC-SCANNER-2.MIT.EDU [18.9.25.13]) by olra.theworths.org (Postfix) with ESMTP id 0173E431FB6 for ; Wed, 9 Nov 2011 05:42:17 -0800 (PST) X-AuditID: 1209190d-b7f726d0000008d1-ba-4eba833978e0 Received: from mailhub-auth-2.mit.edu ( [18.7.62.36]) by dmz-mailsec-scanner-2.mit.edu (Symantec Messaging Gateway) with SMTP id 4B.35.02257.9338ABE4; Wed, 9 Nov 2011 08:42:17 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-2.mit.edu (8.13.8/8.9.2) with ESMTP id pA9DgGlS016121; Wed, 9 Nov 2011 08:42:16 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id pA9DgFsq019353 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Wed, 9 Nov 2011 08:42:16 -0500 (EST) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77) (envelope-from ) id 1RO8SO-0007Qk-Fi; Wed, 09 Nov 2011 08:44:48 -0500 From: Austin Clements To: notmuch@notmuchmail.org Subject: [PATCH v2] tag: Automatically limit to messages whose tags will actually change. Date: Wed, 9 Nov 2011 08:44:35 -0500 Message-Id: <1320846275-28520-1-git-send-email-amdragon@mit.edu> X-Mailer: git-send-email 1.7.7.1 In-Reply-To: <1320724523-23568-1-git-send-email-amdragon@mit.edu> References: <1320724523-23568-1-git-send-email-amdragon@mit.edu> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFIsWRmVeSWpSXmKPExsUixG6nomvZvMvPoPmJpsX1mzOZHRg9nq26 xRzAGMVlk5Kak1mWWqRvl8CV8eeVcMEq1YrNHy8yNjAeluli5OSQEDCRWNIylwnCFpO4cG89 WxcjF4eQwD5Gia/9k5lBEkIC6xklDl0JhkicYJKYNq2dCcLZzChxeUk7I0gVm4CGxLb9y8Fs EQFpiZ13Z7N2MXJwMAuoSfzpUgExhQUiJX7+4gKpYBFQlTjxbDojSJhXwEHi2UpZEFNCQEFi 2Y5qkApOAUeJU3PmsUJc4CBx48hT1gmM/AsYGVYxyqbkVunmJmbmFKcm6xYnJ+blpRbpGunl ZpbopaaUbmIEBQunJO8OxncHlQ4xCnAwKvHwXira6SfEmlhWXJl7iFGSg0lJlFevYZefEF9S fkplRmJxRnxRaU5q8SFGCQ5mJRFe/WqgHG9KYmVValE+TEqag0VJnLdwh4OfkEB6Yklqdmpq QWoRTFaGg0NJgndSE1CjYFFqempFWmZOCUKaiYMTZDgP0PAekBre4oLE3OLMdIj8KUZdjtN/ Lp1iFGLJy89LlRLnnQNSJABSlFGaBzcHFuWvGMWB3hLm7Qap4gEmCLhJr4CWMAEtUTUEW1KS iJCSamDcsKfgYrOz3aYpLjJC4m7svE7r45s7FmaoWlRmCy5y+c0Z6betwvLthC0GzQwiRudU PXTnN5RM6pA/y2ms/q58c9Vp4RA5tt7HR4Je28l5v3Zf4PFjrvOKOxuNM08uKRA6vsW8bc/m Rm/xq9GZjj8WnldYONnxdA5P7+k3FyaeYLKLVFgjn6HEUpyRaKjFXFScCACT70dfzQIAAA== X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Nov 2011 13:42:22 -0000 This optimizes the user's tagging query to exclude messages that won't be affected by the tagging operation, saving computation and IO for redundant tagging operations. For example, notmuch tag +notmuch to:notmuch@notmuchmail.org will now use the query ( to:notmuch@notmuchmail.org ) and (not tag:"notmuch") In the past, we've often suggested that people do this exact transformation by hand for slow tagging operations. This makes that unnecessary. --- This version addresses Jani's comments. NEWS | 9 ++++++ notmuch-tag.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 94 insertions(+), 0 deletions(-) diff --git a/NEWS b/NEWS index e00452a..9ca5e0c 100644 --- a/NEWS +++ b/NEWS @@ -16,6 +16,15 @@ Add search terms to "notmuch dump" search/show/tag. The output file argument of dump is deprecated in favour of using stdout. +Optimizations +------------- + +Automatic tag query optimization + + "notmuch tag" now automatically optimizes the user's query to + exclude messages whose tags won't change. In the past, we've + suggested that people do this by hand; this is no longer necessary. + Notmuch 0.9 (2011-10-01) ======================== diff --git a/notmuch-tag.c b/notmuch-tag.c index dded39e..537d5a4 100644 --- a/notmuch-tag.c +++ b/notmuch-tag.c @@ -30,6 +30,81 @@ handle_sigint (unused (int sig)) interrupted = 1; } +static char * +_escape_tag (char *buf, const char *tag) +{ + const char *in = tag; + char *out = buf; + /* Boolean terms surrounded by double quotes can contain any + * character. Double quotes are quoted by doubling them. */ + *out++ = '"'; + while (*in) { + if (*in == '"') + *out++ = '"'; + *out++ = *in++; + } + *out++ = '"'; + *out = 0; + return buf; +} + +static char * +_optimize_tag_query (void *ctx, const char *orig_query_string, char *argv[], + int *add_tags, int add_tags_count, + int *remove_tags, int remove_tags_count) +{ + /* This is subtler than it looks. Xapian ignores the '-' operator + * at the beginning both queries and parenthesized groups and, + * furthermore, the presence of a '-' operator at the beginning of + * a group can inhibit parsing of the previous operator. Hence, + * the user-provided query MUST appear first, but it is safe to + * parenthesize and the exclusion part of the query must not use + * the '-' operator (though the NOT operator is fine). */ + + char *escaped, *query_string; + const char *join = ""; + int i; + unsigned int max_tag_len = 0; + + /* Allocate a buffer for escaping tags. This is large enough to + * hold a fully escaped tag with every character doubled plus + * enclosing quotes and a NUL. */ + for (i = 0; i < add_tags_count; i++) + if (strlen (argv[add_tags[i]] + 1) > max_tag_len) + max_tag_len = strlen (argv[add_tags[i]] + 1); + for (i = 0; i < remove_tags_count; i++) + if (strlen (argv[remove_tags[i]] + 1) > max_tag_len) + max_tag_len = strlen (argv[remove_tags[i]] + 1); + escaped = talloc_array(ctx, char, max_tag_len * 2 + 3); + if (!escaped) + return NULL; + + /* Build the new query string */ + if (strcmp (orig_query_string, "*") == 0) + query_string = talloc_strdup (ctx, "("); + else + query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string); + + for (i = 0; i < add_tags_count && query_string; i++) { + query_string = talloc_asprintf_append_buffer ( + query_string, "%snot tag:%s", join, + _escape_tag (escaped, argv[add_tags[i]] + 1)); + join = " or "; + } + for (i = 0; i < remove_tags_count && query_string; i++) { + query_string = talloc_asprintf_append_buffer ( + query_string, "%stag:%s", join, + _escape_tag (escaped, argv[remove_tags[i]] + 1)); + join = " or "; + } + + if (query_string) + query_string = talloc_strdup_append_buffer (query_string, ")"); + + talloc_free (escaped); + return query_string; +} + int notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[])) { @@ -93,6 +168,16 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[])) return 1; } + /* Optimize the query so it excludes messages that already have + * the specified set of tags. */ + query_string = _optimize_tag_query (ctx, query_string, argv, + add_tags, add_tags_count, + remove_tags, remove_tags_count); + if (query_string == NULL) { + fprintf (stderr, "Out of memory.\n"); + return 1; + } + config = notmuch_config_open (ctx, NULL, NULL); if (config == NULL) return 1; -- 1.7.7.1