Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id A62E1429E26 for ; Wed, 9 Nov 2011 05:37:48 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bZcJ0LOkaKsv for ; Wed, 9 Nov 2011 05:37:44 -0800 (PST) Received: from dmz-mailsec-scanner-2.mit.edu (DMZ-MAILSEC-SCANNER-2.MIT.EDU [18.9.25.13]) by olra.theworths.org (Postfix) with ESMTP id B3D94431FD0 for ; Wed, 9 Nov 2011 05:37:44 -0800 (PST) X-AuditID: 1209190d-b7f726d0000008d1-0c-4eba822714a4 Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) by dmz-mailsec-scanner-2.mit.edu (Symantec Messaging Gateway) with SMTP id FE.05.02257.7228ABE4; Wed, 9 Nov 2011 08:37:43 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id pA9DbgqX024428; Wed, 9 Nov 2011 08:37:43 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id pA9DbfNU018625 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Wed, 9 Nov 2011 08:37:42 -0500 (EST) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77) (envelope-from ) id 1RO8Ny-0007KU-0D; Wed, 09 Nov 2011 08:40:14 -0500 Date: Wed, 9 Nov 2011 08:40:13 -0500 From: Austin Clements To: Jani Nikula Subject: Re: [PATCH] tag: Automatically limit to messages whose tags will actually change. Message-ID: <20111109134013.GK2658@mit.edu> References: <1320724523-23568-1-git-send-email-amdragon@mit.edu> <87ty6d1y5x.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87ty6d1y5x.fsf@nikula.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFuplleLIzCtJLcpLzFFi42IRYrdT11Vv2uVnsLZbyqJpurPF9ZszmR2Y PG7df83u8WzVLeYApigum5TUnMyy1CJ9uwSujI1z/zMWrDSo+NDexNzA+F+5i5GTQ0LARGLq lL2sELaYxIV769m6GLk4hAT2MUqsnHKFCcJZzygxafV5KOcEk8Tp9SvYIZwljBK/Nx1j6WLk 4GARUJFY08wDMopNQENi2/7ljCC2iICixOaT+8FsZgFpiW+/m5lAbGGBGIm933azgdi8AtoS Mw6cZgcZIyQQJ/FpIStEWFDi5MwnLBCtWhI3/r1kAikBGbP8HwdImBNo08elbWBTRIEOmHJy G9sERqFZSLpnIemehdC9gJF5FaNsSm6Vbm5iZk5xarJucXJiXl5qka6RXm5miV5qSukmRlBI c0ry7mB8d1DpEKMAB6MSD++lop1+QqyJZcWVuYcYJTmYlER59Rp2+QnxJeWnVGYkFmfEF5Xm pBYfYpTgYFYS4dWvBsrxpiRWVqUW5cOkpDlYlMR5C3c4+AkJpCeWpGanphakFsFkZTg4lCR4 ExqBGgWLUtNTK9Iyc0oQ0kwcnCDDeYCGvwdZzFtckJhbnJkOkT/FqMtx+s+lU4xCLHn5ealS 4ry1IIMEQIoySvPg5sBS0StGcaC3hHmjQap4gGkMbtIroCVMQEtUDcGWlCQipKQaGLn/Xvmh LWorlrhV9rnLJIu9lyT0Twao5y08t5J/ZqTCO3ujF4YWLItLEm/fTdv5eunq1Te7+fgc/fx6 os2jlhqKHlFWTYhafvOp79R+Rw/PNx9n3vYsZb2UfHzxHuWZkzSyOF/biL52VSydtYqZT1sv g0ddVclG+LOmtNrks+LXWVu+ORTsUWIpzkg01GIuKk4EAI/lqjAgAwAA Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Nov 2011 13:37:48 -0000 Quoth Jani Nikula on Nov 09 at 8:46 am: > > FWIW, I reviewed this and didn't find any obvious problems. A few > nitpicks below, though. > > BR, > Jani. > > On Mon, 7 Nov 2011 22:55:23 -0500, Austin Clements wrote: > > This optimizes the user's tagging query to exclude messages that won't > > be affected by the tagging operation, saving computation and IO for > > redundant tagging operations. > > > > For example, > > notmuch tag +notmuch to:notmuch@notmuchmail.org > > will now use the query > > ( to:notmuch@notmuchmail.org ) and (not tag:"notmuch") > > > > In the past, we've often suggested that people do this exact > > transformation by hand for slow tagging operations. This makes that > > unnecessary. > > --- > > I was about to implement this optimization in my initial tagging > > script, but then I figured, why not just do it in notmuch so we can > > stop telling people to do this by hand? > > > > NEWS | 9 ++++++ > > notmuch-tag.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 85 insertions(+), 0 deletions(-) > > > > diff --git a/NEWS b/NEWS > > index e00452a..9ca5e0c 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -16,6 +16,15 @@ Add search terms to "notmuch dump" > > search/show/tag. The output file argument of dump is deprecated in > > favour of using stdout. > > > > +Optimizations > > +------------- > > + > > +Automatic tag query optimization > > + > > + "notmuch tag" now automatically optimizes the user's query to > > + exclude messages whose tags won't change. In the past, we've > > + suggested that people do this by hand; this is no longer necessary. > > + > > Notmuch 0.9 (2011-10-01) > > ======================== > > > > diff --git a/notmuch-tag.c b/notmuch-tag.c > > index dded39e..62c4bf1 100644 > > --- a/notmuch-tag.c > > +++ b/notmuch-tag.c > > @@ -30,6 +30,76 @@ handle_sigint (unused (int sig)) > > interrupted = 1; > > } > > > > +static char * > > +_escape_tag (char *buf, const char *tag) > > +{ > > + const char *in = tag; > > + char *out = buf; > > + /* Boolean terms surrounded by double quotes can contain any > > + * character. Double quotes are quoted by doubling them. */ > > + *(out++) = '"'; > > + while (*in) { > > + if (*in == '"') > > + *(out++) = '"'; > > + *(out++) = *(in++); > > + } > > + *(out++) = '"'; > > The parenthesis are unnecessary for *p++. Removed. I put these in out of paranoia, but I suppose it wouldn't be an lvalue if it parsed differently. > > + *out = 0; > > + return buf; > > +} > > + > > +static char * > > +_optimize_tag_query (void *ctx, const char *orig_query_string, char *argv[], > > + int *add_tags, int add_tags_count, > > + int *remove_tags, int remove_tags_count) > > +{ > > + /* This is subtler than it looks. Xapian ignores the '-' operator > > + * at the beginning both queries and parenthesized groups and, > > + * furthermore, the presence of a '-' operator at the beginning of > > + * a group can inhibit parsing of the previous operator. Hence, > > + * the user-provided query MUST appear first, but it is safe to > > + * parenthesize and the exclusion part of the query must not use > > + * the '-' operator (though the NOT operator is fine). */ > > + > > + char *escaped, *query_string; > > + const char *join = ""; > > + int i; > > + unsigned int max_tag_len = 0; > > + > > + /* Allocate a buffer for escaping tags. */ > > + for (i = 0; i < add_tags_count; i++) > > + if (strlen (argv[add_tags[i]] + 1) > max_tag_len) > > + max_tag_len = strlen (argv[add_tags[i]] + 1); > > + for (i = 0; i < remove_tags_count; i++) > > + if (strlen (argv[remove_tags[i]] + 1) > max_tag_len) > > + max_tag_len = strlen (argv[remove_tags[i]] + 1); > > + escaped = talloc_array(ctx, char, max_tag_len * 2 + 3); > > Perhaps a comment here or above _escape_tag() explaining the worst case > memory consumption of strlen(tag) * 2 + 3 for a tag of "s would be in > order. Definitely. Done. > It's unrelated, but looking at the above also made me check something > I've suspected before: notmuch allows you to have empty or zero length > tags "", which is probably not intentional. > > There's no check for talloc failures here or below. But then there are > few checks for that in the cli in general. *shrug*. It's unfortunate that error handling obscures C code so much. But there's no sense in not handling errors, so I fixed this. > > + > > + /* Build the new query string */ > > + if (strcmp (orig_query_string, "*") == 0) > > + query_string = talloc_strdup (ctx, "("); > > + else > > + query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string); > > + > > + for (i = 0; i < add_tags_count; i++) { > > + query_string = talloc_asprintf_append_buffer ( > > + query_string, "%snot tag:%s", join, > > + _escape_tag (escaped, argv[add_tags[i]] + 1)); > > + join = " or "; > > + } > > + for (i = 0; i < remove_tags_count; i++) { > > + query_string = talloc_asprintf_append_buffer ( > > + query_string, "%stag:%s", join, > > + _escape_tag (escaped, argv[remove_tags[i]] + 1)); > > + join = " or "; > > + } > > + > > + query_string = talloc_strdup_append_buffer (query_string, ")"); > > + > > + talloc_free (escaped); > > + return query_string; > > +} > > + > > int > > notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[])) > > { > > @@ -93,6 +163,12 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[])) > > return 1; > > } > > > > + /* Optimize the query so it excludes messages that already have > > + * the specified set of tags. */ > > + query_string = _optimize_tag_query (ctx, query_string, argv, > > + add_tags, add_tags_count, > > + remove_tags, remove_tags_count); > > + > > config = notmuch_config_open (ctx, NULL, NULL); > > if (config == NULL) > > return 1;