Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 71F5B429E27 for ; Mon, 7 Nov 2011 20:34:53 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.799 X-Spam-Level: X-Spam-Status: No, score=-0.799 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id szeFB-HruJIX for ; Mon, 7 Nov 2011 20:34:52 -0800 (PST) Received: from mail-bw0-f53.google.com (mail-bw0-f53.google.com [209.85.214.53]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 4D065431FB6 for ; Mon, 7 Nov 2011 20:34:52 -0800 (PST) Received: by bkaq10 with SMTP id q10so88250bka.26 for ; Mon, 07 Nov 2011 20:34:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:to:subject:in-reply-to:references:user-agent:date:message-id :mime-version:content-type; bh=guJO5VtJYdYEu5Ut/R+jv3pK/W4WgiY4akkXkDp4hCA=; b=Xq27+zSA9iOEiz4s/Se6LWSBAibdfEUH+F54YXGeiwxCY8v0Gvzli/aDzWVXPNi9gR UHNmV2ZEmYCduxWYt0eqaJ9bjCyZcjzlhHRKlAxiOmzFuSz9Zj1/JtqX6QCOOqAD6vqH WDkqx5Bhuda1FRRzVfuIlf/GjeUMgvPfrGw8I= Received: by 10.204.9.205 with SMTP id m13mr21080663bkm.32.1320726884572; Mon, 07 Nov 2011 20:34:44 -0800 (PST) Received: from localhost ([91.144.186.21]) by mx.google.com with ESMTPS id z15sm233074bkv.4.2011.11.07.20.34.42 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 07 Nov 2011 20:34:43 -0800 (PST) From: Dmitry Kurochkin To: Austin Clements , notmuch@notmuchmail.org Subject: Re: [PATCH] tag: Automatically limit to messages whose tags will actually change. In-Reply-To: <1320724523-23568-1-git-send-email-amdragon@mit.edu> References: <1320724523-23568-1-git-send-email-amdragon@mit.edu> User-Agent: Notmuch/0.9+34~g27fff04 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Tue, 08 Nov 2011 08:34:32 +0400 Message-ID: <878vnrut9j.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Nov 2011 04:34:53 -0000 Hi Austin. On Mon, 7 Nov 2011 22:55:23 -0500, Austin Clements wrote: > This optimizes the user's tagging query to exclude messages that won't > be affected by the tagging operation, saving computation and IO for > redundant tagging operations. > > For example, > notmuch tag +notmuch to:notmuch@notmuchmail.org > will now use the query > ( to:notmuch@notmuchmail.org ) and (not tag:"notmuch") > > In the past, we've often suggested that people do this exact > transformation by hand for slow tagging operations. This makes that > unnecessary. Thanks! This is a very useful optimization. Does it work for multiple tags and tag removal? I.e.: notmuch tag -inbox -unread +sent from:dmitry.kurochkin@gmail.com can be converted to: notmuch tag -inbox -unread +sent from:dmitry.kurochkin@gmail.com and (tag:inbox or tag:unread or (not tag:sent)) Regards, Dmitry > --- > I was about to implement this optimization in my initial tagging > script, but then I figured, why not just do it in notmuch so we can > stop telling people to do this by hand? > > NEWS | 9 ++++++ > notmuch-tag.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 85 insertions(+), 0 deletions(-) > > diff --git a/NEWS b/NEWS > index e00452a..9ca5e0c 100644 > --- a/NEWS > +++ b/NEWS > @@ -16,6 +16,15 @@ Add search terms to "notmuch dump" > search/show/tag. The output file argument of dump is deprecated in > favour of using stdout. > > +Optimizations > +------------- > + > +Automatic tag query optimization > + > + "notmuch tag" now automatically optimizes the user's query to > + exclude messages whose tags won't change. In the past, we've > + suggested that people do this by hand; this is no longer necessary. > + > Notmuch 0.9 (2011-10-01) > ======================== > > diff --git a/notmuch-tag.c b/notmuch-tag.c > index dded39e..62c4bf1 100644 > --- a/notmuch-tag.c > +++ b/notmuch-tag.c > @@ -30,6 +30,76 @@ handle_sigint (unused (int sig)) > interrupted = 1; > } > > +static char * > +_escape_tag (char *buf, const char *tag) > +{ > + const char *in = tag; > + char *out = buf; > + /* Boolean terms surrounded by double quotes can contain any > + * character. Double quotes are quoted by doubling them. */ > + *(out++) = '"'; > + while (*in) { > + if (*in == '"') > + *(out++) = '"'; > + *(out++) = *(in++); > + } > + *(out++) = '"'; > + *out = 0; > + return buf; > +} > + > +static char * > +_optimize_tag_query (void *ctx, const char *orig_query_string, char *argv[], > + int *add_tags, int add_tags_count, > + int *remove_tags, int remove_tags_count) > +{ > + /* This is subtler than it looks. Xapian ignores the '-' operator > + * at the beginning both queries and parenthesized groups and, > + * furthermore, the presence of a '-' operator at the beginning of > + * a group can inhibit parsing of the previous operator. Hence, > + * the user-provided query MUST appear first, but it is safe to > + * parenthesize and the exclusion part of the query must not use > + * the '-' operator (though the NOT operator is fine). */ > + > + char *escaped, *query_string; > + const char *join = ""; > + int i; > + unsigned int max_tag_len = 0; > + > + /* Allocate a buffer for escaping tags. */ > + for (i = 0; i < add_tags_count; i++) > + if (strlen (argv[add_tags[i]] + 1) > max_tag_len) > + max_tag_len = strlen (argv[add_tags[i]] + 1); > + for (i = 0; i < remove_tags_count; i++) > + if (strlen (argv[remove_tags[i]] + 1) > max_tag_len) > + max_tag_len = strlen (argv[remove_tags[i]] + 1); > + escaped = talloc_array(ctx, char, max_tag_len * 2 + 3); > + > + /* Build the new query string */ > + if (strcmp (orig_query_string, "*") == 0) > + query_string = talloc_strdup (ctx, "("); > + else > + query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string); > + > + for (i = 0; i < add_tags_count; i++) { > + query_string = talloc_asprintf_append_buffer ( > + query_string, "%snot tag:%s", join, > + _escape_tag (escaped, argv[add_tags[i]] + 1)); > + join = " or "; > + } > + for (i = 0; i < remove_tags_count; i++) { > + query_string = talloc_asprintf_append_buffer ( > + query_string, "%stag:%s", join, > + _escape_tag (escaped, argv[remove_tags[i]] + 1)); > + join = " or "; > + } > + > + query_string = talloc_strdup_append_buffer (query_string, ")"); > + > + talloc_free (escaped); > + return query_string; > +} > + > int > notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[])) > { > @@ -93,6 +163,12 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[])) > return 1; > } > > + /* Optimize the query so it excludes messages that already have > + * the specified set of tags. */ > + query_string = _optimize_tag_query (ctx, query_string, argv, > + add_tags, add_tags_count, > + remove_tags, remove_tags_count); > + > config = notmuch_config_open (ctx, NULL, NULL); > if (config == NULL) > return 1; > -- > 1.7.7.1 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch