From: Jani Nikula <jani@nikula.org>
To: david@tethera.net, notmuch@notmuchmail.org
Subject: Re: [Patch v8 07/18] unhex_and_quote: new function to quote
	hex-decoded queries
In-Reply-To: <1356095307-22895-7-git-send-email-david@tethera.net>
References: <1356095307-22895-1-git-send-email-david@tethera.net>
	<1356095307-22895-7-git-send-email-david@tethera.net>
User-Agent: Notmuch/0.14+211~gc8d6546 (http://notmuchmail.org) Emacs/24.2.1
	(x86_64-pc-linux-gnu)
Date: Sun, 23 Dec 2012 01:36:03 +0200
Message-ID: <87txrdhd7g.fsf@oiva.home.nikula.org>
MIME-Version: 1.0
Content-Type: text/plain
Cc: David Bremner <bremner@debian.org>
Precedence: list

On Fri, 21 Dec 2012, david@tethera.net wrote:
> From: David Bremner <bremner@debian.org>
>
> Space delimited tokens are hex decoded and then quoted according to
> Xapian rules. Prefixes and '*' are passed through unquoted, as is
> anything that hex-decoding would not change.
> ---
>  tag-util.c |   81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 81 insertions(+)
>
> diff --git a/tag-util.c b/tag-util.c
> index f89669a..46aab4e 100644
> --- a/tag-util.c
> +++ b/tag-util.c
> @@ -56,6 +56,87 @@ illegal_tag (const char *tag, notmuch_bool_t remove)
>      return NULL;
>  }
>  
> +/* Input is a hex encoded string, presumed to be a query for Xapian.
> + *
> + * Space delimited tokens are decoded and quoted, with '*' and prefixes
> + * of the form "foo:" passed through unquoted.
> + */
> +static tag_parse_status_t
> +unhex_and_quote (void *ctx, char *encoded, const char *line_for_error,
> +		 char **query_string)
> +{
> +    char *tok = encoded;
> +    size_t tok_len = 0;
> +    char *buf = NULL;
> +    size_t buf_len = 0;
> +    tag_parse_status_t ret = TAG_PARSE_SUCCESS;
> +
> +    *query_string = talloc_strdup (ctx, "");
> +
> +    while ((tok = strtok_len (tok + tok_len, " ", &tok_len)) != NULL) {
> +
> +	size_t prefix_len;
> +	char delim = *(tok + tok_len);
> +
> +	*(tok + tok_len++) = '\0';

You need to do tok_len++ to satisfy the next round of strtok_len, but I
think for clarity of the code below you should move tok_len++ to the end
of the while block. And review tok_len usage below.

> +
> +	prefix_len = hex_invariant (tok, tok_len);

This, along with the doc comment "...initial segment of str that would
not be changed by hex encoding..." for hex_invariant, was the hardest
bit to understand. I don't follow how hex *encoding* matters here; the
input is only affected by hex *decoding*, and that depends on %NN only.

Should this be a function that counts the length of the initial segment
of str that consists of valid Xapian prefix characters and does not
contain % (I don't know if that's included in Xapian prefix
characters). In the end, the contents of the function may be (I don't
know) exactly the same as hex_invariant, but the function name would be
more self explanatory.

Does that make any sense to you...?

> +
> +	if ((strcmp (tok, "*") == 0) || prefix_len >= tok_len - 1) {

With the tok_len++ at the end, I think this should have "prefix_len ==
tok_len" for clarity.

> +
> +	    /* pass some things through without quoting or decoding.
> +	     * Note for '*' this is mandatory.
> +	     */
> +
> +	    if (! (*query_string = talloc_asprintf_append_buffer (
> +		       *query_string, "%s%c", tok, delim))) {
> +
> +		ret = line_error (TAG_PARSE_OUT_OF_MEMORY,
> +				  line_for_error, "aborting");
> +		goto DONE;
> +	    }
> +
> +	} else {
> +	    /* potential prefix: one for ':', then something after */
> +	    if ((tok_len - prefix_len > 2) && *(tok + prefix_len) == ':') {

I don't think this takes into account the tok_len++. So this should stay
as it is if you move tok_len++ to the end.

> +		if (! (*query_string = talloc_strndup_append (*query_string,
> +							      tok,
> +							      prefix_len + 1))) {
> +		    ret = line_error (TAG_PARSE_OUT_OF_MEMORY,
> +				      line_for_error, "aborting");
> +		    goto DONE;
> +		}
> +		tok += prefix_len + 1;
> +		tok_len -= prefix_len + 1;
> +	    }
> +
> +	    if (hex_decode_inplace (tok) != HEX_SUCCESS) {
> +		ret = line_error (TAG_PARSE_INVALID, line_for_error,
> +				  "hex decoding of token '%s' failed", tok);
> +		goto DONE;
> +	    }
> +
> +	    if (double_quote_str (ctx, tok, &buf, &buf_len)) {
> +		ret = line_error (TAG_PARSE_OUT_OF_MEMORY,
> +				  line_for_error, "aborting");
> +		goto DONE;
> +	    }
> +
> +	    if (! (*query_string = talloc_asprintf_append_buffer (
> +		       *query_string, "%s%c", buf, delim))) {
> +		ret = line_error (TAG_PARSE_OUT_OF_MEMORY,
> +				  line_for_error, "aborting");
> +		goto DONE;
> +	    }
> +	}

I think tok_len++ should be here.

BR,
Jani.

> +    }
> +
> +  DONE:
> +    if (ret != TAG_PARSE_SUCCESS && *query_string)
> +	talloc_free (*query_string);
> +    return ret;
> +}
> +
>  tag_parse_status_t
>  parse_tag_line (void *ctx, char *line,
>  		tag_op_flag_t flags,
> -- 
> 1.7.10.4
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch