Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id DC388431FAF for ; Sat, 22 Dec 2012 15:36:09 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L+IhLMBCrkVs for ; Sat, 22 Dec 2012 15:36:09 -0800 (PST) Received: from mail-la0-f50.google.com (mail-la0-f50.google.com [209.85.215.50]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id E4263431FAE for ; Sat, 22 Dec 2012 15:36:08 -0800 (PST) Received: by mail-la0-f50.google.com with SMTP id c1so7069663lah.9 for ; Sat, 22 Dec 2012 15:36:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:to:cc:subject:in-reply-to:references:user-agent :date:message-id:mime-version:content-type:x-gm-message-state; bh=p3CaFoivDEW/ydAv352pxY83u8ZH7iZxZ0e7nU9Vzh0=; b=QD7wzSqcoEdWmwQWqQFevU9TFOHB+8/L3BDac4cq7U74MzoVflq0VA6GNlUnvcxa+9 y5ITiW9v5ECEZDUQUIdZCO/krER4FttBjF7ZoFXNcuZCLtaWvkbQsdX79NIJKff/DbSU Y9WiFac7mcuWoLMgFXZFAnKNM7g1oGdCwfQe0KibjpYPJ4kcO9Kf4vqoCf1/ekPebvzp 1fa5hXzDnTMe2ESmDDS6oT18P9GPj8b7LdHMNmf6VYyyyRXELyiaLVfXZ1wpi/y8lf38 rFoVZFLCRzj7Paz4I8CrbqUU4BhLRWGgJvAiYna1a50CdAlVRvzSdfkMdAqJxc9nHZ09 g0fw== X-Received: by 10.152.125.240 with SMTP id mt16mr16389477lab.17.1356219367368; Sat, 22 Dec 2012 15:36:07 -0800 (PST) Received: from localhost (dsl-hkibrasgw4-50df51-27.dhcp.inet.fi. [80.223.81.27]) by mx.google.com with ESMTPS id bf3sm5949712lbb.16.2012.12.22.15.36.04 (version=SSLv3 cipher=OTHER); Sat, 22 Dec 2012 15:36:06 -0800 (PST) From: Jani Nikula To: david@tethera.net, notmuch@notmuchmail.org Subject: Re: [Patch v8 07/18] unhex_and_quote: new function to quote hex-decoded queries In-Reply-To: <1356095307-22895-7-git-send-email-david@tethera.net> References: <1356095307-22895-1-git-send-email-david@tethera.net> <1356095307-22895-7-git-send-email-david@tethera.net> User-Agent: Notmuch/0.14+211~gc8d6546 (http://notmuchmail.org) Emacs/24.2.1 (x86_64-pc-linux-gnu) Date: Sun, 23 Dec 2012 01:36:03 +0200 Message-ID: <87txrdhd7g.fsf@oiva.home.nikula.org> MIME-Version: 1.0 Content-Type: text/plain X-Gm-Message-State: ALoCoQnV28RTEqNv3tbjSScEVBiQ/d9VUnE1WsVGa9fVhxFHWy3cJ7W2XboSdPMnsi3QTfAvDfT+ Cc: David Bremner X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2012 23:36:10 -0000 On Fri, 21 Dec 2012, david@tethera.net wrote: > From: David Bremner > > Space delimited tokens are hex decoded and then quoted according to > Xapian rules. Prefixes and '*' are passed through unquoted, as is > anything that hex-decoding would not change. > --- > tag-util.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 81 insertions(+) > > diff --git a/tag-util.c b/tag-util.c > index f89669a..46aab4e 100644 > --- a/tag-util.c > +++ b/tag-util.c > @@ -56,6 +56,87 @@ illegal_tag (const char *tag, notmuch_bool_t remove) > return NULL; > } > > +/* Input is a hex encoded string, presumed to be a query for Xapian. > + * > + * Space delimited tokens are decoded and quoted, with '*' and prefixes > + * of the form "foo:" passed through unquoted. > + */ > +static tag_parse_status_t > +unhex_and_quote (void *ctx, char *encoded, const char *line_for_error, > + char **query_string) > +{ > + char *tok = encoded; > + size_t tok_len = 0; > + char *buf = NULL; > + size_t buf_len = 0; > + tag_parse_status_t ret = TAG_PARSE_SUCCESS; > + > + *query_string = talloc_strdup (ctx, ""); > + > + while ((tok = strtok_len (tok + tok_len, " ", &tok_len)) != NULL) { > + > + size_t prefix_len; > + char delim = *(tok + tok_len); > + > + *(tok + tok_len++) = '\0'; You need to do tok_len++ to satisfy the next round of strtok_len, but I think for clarity of the code below you should move tok_len++ to the end of the while block. And review tok_len usage below. > + > + prefix_len = hex_invariant (tok, tok_len); This, along with the doc comment "...initial segment of str that would not be changed by hex encoding..." for hex_invariant, was the hardest bit to understand. I don't follow how hex *encoding* matters here; the input is only affected by hex *decoding*, and that depends on %NN only. Should this be a function that counts the length of the initial segment of str that consists of valid Xapian prefix characters and does not contain % (I don't know if that's included in Xapian prefix characters). In the end, the contents of the function may be (I don't know) exactly the same as hex_invariant, but the function name would be more self explanatory. Does that make any sense to you...? > + > + if ((strcmp (tok, "*") == 0) || prefix_len >= tok_len - 1) { With the tok_len++ at the end, I think this should have "prefix_len == tok_len" for clarity. > + > + /* pass some things through without quoting or decoding. > + * Note for '*' this is mandatory. > + */ > + > + if (! (*query_string = talloc_asprintf_append_buffer ( > + *query_string, "%s%c", tok, delim))) { > + > + ret = line_error (TAG_PARSE_OUT_OF_MEMORY, > + line_for_error, "aborting"); > + goto DONE; > + } > + > + } else { > + /* potential prefix: one for ':', then something after */ > + if ((tok_len - prefix_len > 2) && *(tok + prefix_len) == ':') { I don't think this takes into account the tok_len++. So this should stay as it is if you move tok_len++ to the end. > + if (! (*query_string = talloc_strndup_append (*query_string, > + tok, > + prefix_len + 1))) { > + ret = line_error (TAG_PARSE_OUT_OF_MEMORY, > + line_for_error, "aborting"); > + goto DONE; > + } > + tok += prefix_len + 1; > + tok_len -= prefix_len + 1; > + } > + > + if (hex_decode_inplace (tok) != HEX_SUCCESS) { > + ret = line_error (TAG_PARSE_INVALID, line_for_error, > + "hex decoding of token '%s' failed", tok); > + goto DONE; > + } > + > + if (double_quote_str (ctx, tok, &buf, &buf_len)) { > + ret = line_error (TAG_PARSE_OUT_OF_MEMORY, > + line_for_error, "aborting"); > + goto DONE; > + } > + > + if (! (*query_string = talloc_asprintf_append_buffer ( > + *query_string, "%s%c", buf, delim))) { > + ret = line_error (TAG_PARSE_OUT_OF_MEMORY, > + line_for_error, "aborting"); > + goto DONE; > + } > + } I think tok_len++ should be here. BR, Jani. > + } > + > + DONE: > + if (ret != TAG_PARSE_SUCCESS && *query_string) > + talloc_free (*query_string); > + return ret; > +} > + > tag_parse_status_t > parse_tag_line (void *ctx, char *line, > tag_op_flag_t flags, > -- > 1.7.10.4 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch