Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 2322A431FBC for ; Sun, 23 Dec 2012 18:34:44 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -1.098 X-Spam-Level: X-Spam-Status: No, score=-1.098 tagged_above=-999 required=5 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001, NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tfdxGNHbv73t for ; Sun, 23 Dec 2012 18:34:43 -0800 (PST) Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id A4975431FAF for ; Sun, 23 Dec 2012 18:34:42 -0800 (PST) Received: from smtp.qmul.ac.uk ([138.37.6.40]) by mail2.qmul.ac.uk with esmtp (Exim 4.71) (envelope-from ) id 1TmxsA-0003Ts-Dk; Mon, 24 Dec 2012 02:34:36 +0000 Received: from 93-97-24-31.zone5.bethere.co.uk ([93.97.24.31] helo=localhost) by smtp.qmul.ac.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1Tmxs9-0003hM-T4; Mon, 24 Dec 2012 02:34:34 +0000 From: Mark Walters To: david@tethera.net, notmuch@notmuchmail.org Subject: Re: v9 of batch tagging In-Reply-To: <1356313183-9266-1-git-send-email-david@tethera.net> References: <1356313183-9266-1-git-send-email-david@tethera.net> User-Agent: Notmuch/0.14+236~g1d0044f (http://notmuchmail.org) Emacs/23.4.1 (x86_64-pc-linux-gnu) Date: Mon, 24 Dec 2012 02:34:33 +0000 Message-ID: <8738yw2n5y.fsf@qmul.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Sender-Host-Address: 93.97.24.31 X-QM-SPAM-Info: Sender has good ham record. :) X-QM-Body-MD5: 0364f6ddfdb86c7c92c1825bff0e5610 (of first 20000 bytes) X-SpamAssassin-Score: -1.8 X-SpamAssassin-SpamBar: - X-SpamAssassin-Report: The QM spam filters have analysed this message to determine if it is spam. We require at least 5.0 points to mark a message as spam. This message scored -1.8 points. Summary of the scoring: * -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, * medium trust * [138.37.6.40 listed in list.dnswl.org] * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider * (markwalters1009[at]gmail.com) * -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay * domain * 0.5 AWL AWL: From: address is in the auto white-list X-QM-Scan-Virus: ClamAV says the message is clean X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Dec 2012 02:34:44 -0000 On Mon, 24 Dec 2012, david@tethera.net wrote: > This obsoletes > > id:1356095307-22895-1-git-send-email-david@tethera.net > > The main changes since v8 are the rebasing against the notmuch-restore > fixes in master, and the rewrite of the query (pre)-processing > unhex_and_quote. This incorporates the changes of > > id:1356231570-28232-1-git-send-email-david@tethera.net > > and now handles '()' (cf. id:87a9t5p4dz.fsf@qmul.ac.uk) > > With respect to > > ,---- > | Finally, I don't know if a query can contain a : without being a > | prefix query. If it can that could end up being misquoted. > `---- > > This is pretty easy to work around by encoding that :. I think unless > it is a problem in practice I prefer not to keep an explicity list of > prefixes here; recognizing prefixes should really be a service from > libnotmuch. I am quite happy with this. > I dropped two patches (strnspn and hex_invariant), but picked up a new > strtok variation. Probably the name strtok_len2 could be improved > (and I see there is a typo in the patch subject). > > [Patch v9 05/17] util/string-util: add a new string tokenized > Patches 5 and 6 look good to me. > Finally I added a test for the new parenthesis handling. My recollection is that dump prints the messages unsorted: does this mean that we could get unstable results for these tests (eg with different Xapian versions)? Best wishes Mark > > [Patch v9 17/17] test/tagging: add test for handling of parens > > Fixup wise, the tests needed to be adjusted a bit for () being delimiters, > and the man page as well. > > I added the fclose in id:87wqw9hf9a.fsf@oiva.home.nikula.org > > And I modified the return value per id:87zk15hi7f.fsf@oiva.home.nikula.org > > Here is the interdiff for unhex_and_quote: > > commit 67c6aee87db5c7da25529e1c0feb64e422abb4b7 > Author: David Bremner > Date: Sat Dec 22 22:49:02 2012 -0400 > > simplify unhex_and_quote, support parens > > the overgeneral definition of a prefix can be replaced by lower case > alphabetic, and still work fine with current notmuch query syntax. > > use () as delimiters in unhex_and_quote, preserve delimiters > > diff --git a/tag-util.c b/tag-util.c > index 6f62fe6..91f3603 100644 > --- a/tag-util.c > +++ b/tag-util.c > @@ -56,6 +56,21 @@ illegal_tag (const char *tag, notmuch_bool_t remove) > return NULL; > } > > +/* Factor out the boilerplate to append a token to the query string. > + * For use in unhex_and_quote */ > + > +static tag_parse_status_t > +append_tok (const char *tok, size_t tok_len, > + const char *line_for_error, char **query_string) > +{ > + > + *query_string = talloc_strndup_append_buffer (*query_string, tok, tok_len); > + if (*query_string == NULL) > + return line_error (TAG_PARSE_OUT_OF_MEMORY, line_for_error, "aborting"); > + > + return TAG_PARSE_SUCCESS; > +} > + > /* Input is a hex encoded string, presumed to be a query for Xapian. > * > * Space delimited tokens are decoded and quoted, with '*' and prefixes > @@ -67,45 +82,41 @@ unhex_and_quote (void *ctx, char *encoded, const char *line_for_error, > { > char *tok = encoded; > size_t tok_len = 0; > + size_t delim_len = 0; > char *buf = NULL; > size_t buf_len = 0; > tag_parse_status_t ret = TAG_PARSE_SUCCESS; > > *query_string = talloc_strdup (ctx, ""); > > - while ((tok = strtok_len (tok + tok_len, " ", &tok_len)) != NULL) { > + while ((tok = strtok_len2 (tok + tok_len + delim_len, " ()", > + &tok_len, &delim_len)) != NULL) { > > size_t prefix_len; > char delim = *(tok + tok_len); > > - *(tok + tok_len++) = '\0'; > + *(tok + tok_len) = '\0'; > > - prefix_len = hex_invariant (tok, tok_len); > + /* The following matches a superset of prefixes currently > + * used by notmuch */ > + prefix_len = strspn (tok, "abcdefghijklmnopqrstuvwxyz"); > > - if ((strcmp (tok, "*") == 0) || prefix_len >= tok_len - 1) { > + if ((strcmp (tok, "*") == 0) || prefix_len == tok_len) { > > /* pass some things through without quoting or decoding. > * Note for '*' this is mandatory. > */ > > - if (! (*query_string = talloc_asprintf_append_buffer ( > - *query_string, "%s%c", tok, delim))) { > - > - ret = line_error (TAG_PARSE_OUT_OF_MEMORY, > - line_for_error, "aborting"); > - goto DONE; > - } > + ret = append_tok (tok, tok_len, line_for_error, query_string); > + if (ret) goto DONE; > > } else { > /* potential prefix: one for ':', then something after */ > - if ((tok_len - prefix_len > 2) && *(tok + prefix_len) == ':') { > - if (! (*query_string = talloc_strndup_append (*query_string, > - tok, > - prefix_len + 1))) { > - ret = line_error (TAG_PARSE_OUT_OF_MEMORY, > - line_for_error, "aborting"); > - goto DONE; > - } > + if ((tok_len - prefix_len >= 2) && *(tok + prefix_len) == ':') { > + ret = append_tok (tok, prefix_len + 1, > + line_for_error, query_string); > + if (ret) goto DONE; > + > tok += prefix_len + 1; > tok_len -= prefix_len + 1; > } > @@ -122,13 +133,15 @@ unhex_and_quote (void *ctx, char *encoded, const char *line_for_error, > goto DONE; > } > > - if (! (*query_string = talloc_asprintf_append_buffer ( > - *query_string, "%s%c", buf, delim))) { > - ret = line_error (TAG_PARSE_OUT_OF_MEMORY, > - line_for_error, "aborting"); > - goto DONE; > - } > + ret = append_tok (buf, buf_len, line_for_error, query_string); > + if (ret) goto DONE; > } > + /* restore the string */ > + *(tok + tok_len) = delim; > + > + /* copy any delimiters */ > + ret = append_tok (tok + tok_len, delim_len, line_for_error, query_string); > + if (ret) goto DONE; > } > > DONE: > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch