Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 8EE17429E25 for ; Sun, 11 Dec 2011 09:57:15 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.798 X-Spam-Level: X-Spam-Status: No, score=-0.798 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, NORMAL_HTTP_TO_IP=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YP7IQp4Uatn6 for ; Sun, 11 Dec 2011 09:57:13 -0800 (PST) Received: from mail-bw0-f53.google.com (mail-bw0-f53.google.com [209.85.214.53]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 32A9A431FB6 for ; Sun, 11 Dec 2011 09:57:13 -0800 (PST) Received: by bkat8 with SMTP id t8so4950674bka.26 for ; Sun, 11 Dec 2011 09:57:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:to:cc:subject:in-reply-to:references:user-agent:date :message-id:mime-version:content-type; bh=0Q991LE8Zo1JtOERL1PRro9ClRC4brapR+tjW8kzLMM=; b=Bob/aiwGH46v1k4oxJU1FQuxcGImdT6D6vnYo6CiIUPwMjKzEuVZudq8VyPLS8I50M MzCDRt+dTAfEDOqCTHpa1AcHqvoTi8neaS7D2F2761JH8WIJH4SO92i/pFXFGFTFnINq g5/zXBTq1wPdeHlQKn8fQtEOAEbB0EnUvUnxY= Received: by 10.204.148.67 with SMTP id o3mr8466298bkv.130.1323626231818; Sun, 11 Dec 2011 09:57:11 -0800 (PST) Received: from localhost ([91.144.186.21]) by mx.google.com with ESMTPS id cc2sm25390088bkb.8.2011.12.11.09.57.10 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 11 Dec 2011 09:57:11 -0800 (PST) From: Dmitry Kurochkin To: David Bremner , notmuch@notmuchmail.org Subject: Re: [PATCH] util/hex-escape.[ch]: encoding/decoding strings into restricted character set In-Reply-To: <1323620384-16043-1-git-send-email-david@tethera.net> References: <1323620384-16043-1-git-send-email-david@tethera.net> User-Agent: Notmuch/0.10.2+82~g96a629c (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Sun, 11 Dec 2011 21:56:36 +0400 Message-ID: <87zkez56wb.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: David Bremner X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Dec 2011 17:57:15 -0000 On Sun, 11 Dec 2011 12:19:44 -0400, David Bremner wrote: > From: David Bremner > > The character set is chosen to be suitable for pathnames, and the same > as that used by contrib/nmbug. The new encoded/decoded strings are > allocated using talloc. > --- > This isn't urgent, but it is useful for a couple projects I have > brewing (nmbug compatible dump/restore and tag logging), so I thought > I would get some feedback on it. > I have some free time to spend on notmuch reviews today. So here it is comes :) > > util/Makefile.local | 4 +- > util/hex-escape.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++ > util/hex-escape.h | 10 +++++ > 3 files changed, 122 insertions(+), 2 deletions(-) > create mode 100644 util/hex-escape.c > create mode 100644 util/hex-escape.h > > diff --git a/util/Makefile.local b/util/Makefile.local > index 0340899..2e63932 100644 > --- a/util/Makefile.local > +++ b/util/Makefile.local > @@ -3,11 +3,11 @@ > dir := util > extra_cflags += -I$(srcdir)/$(dir) > > -libutil_c_srcs := $(dir)/xutil.c $(dir)/error_util.c > +libutil_c_srcs := $(dir)/xutil.c $(dir)/error_util.c $(dir)/hex-escape.c > > libutil_modules := $(libutil_c_srcs:.c=.o) > > $(dir)/libutil.a: $(libutil_modules) > $(call quiet,AR) rcs $@ $^ > > -CLEAN := $(CLEAN) $(dir)/xutil.o $(dir)/error_util.o $(dir)/libutil.a > +CLEAN := $(CLEAN) $(libutil_modules) $(dir)/libutil.a IMO this should be pushed as a separate patch (that does not need a review :)). > diff --git a/util/hex-escape.c b/util/hex-escape.c > new file mode 100644 > index 0000000..c294bb5 > --- /dev/null > +++ b/util/hex-escape.c > @@ -0,0 +1,110 @@ > +/* hex-escape.c - Manage encoding and decoding of byte strings into > + * a restricted character set. > + * > + * Copyright (c) 2011 David Bremner > + * > + * This program is free software: you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation, either version 3 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program. If not, see http://www.gnu.org/licenses/ . > + * > + * Author: David Bremner > + */ > + > +#include > +#include > +#include "error_util.h" > +#include "hex-escape.h" > + > +static int > +escapes_needed (const char *str){ The name suggests that the function returns a boolean (needed or not needed). Consider renaming to escapes_count, count_escapes or similar. Also there is a space missing before the {. > + int escapes = 0; > + > + while (*str) { > + if (index (HEX_NO_ESCAPE, *str) == NULL) Consider adding a function (bool escape_needed(const char c)) (similar to isalpha() and friends) which would call index() and use it here and in hex_encode. (This comment would not be actual if you decide to introduce a general char counting function.) > + escapes++; > + str++; > + } > + > + return escapes; Can we replace this function with a two-line for loop similar to the one in hex_decode? I think we should either use inline loops for counting chars in both hex_encode and hex_decode, or introduce a more general function that counts the number of given characters and use it in both hex_encode and hex_decode. > +} > + > +char * > +hex_encode (void *ctx, const char *str) { > + char *newstr = talloc_size (ctx, strlen (str)+3*escapes_needed (str)+1); Do we need only strlen(str) + 2*escaped_count + 1 here? IMO it is very weird that we have spaces after a function name, but do not have spaces around +... > + > + char *out = newstr; > + Why do we need both out and newstr variables? > + while (*str) { > + if (index (HEX_NO_ESCAPE, *str)) { > + *out++ = *str++; > + } else { > + sprintf (out, "%%%02x", *str); > + str++; Use *str++ as sprintf() parameter instead of doing it on a separate line? > + out += 3; > + } > + } > + *out = 0; I would prefer '\0' here. > + return newstr; > +} > + > +inline static int > +_digit (char c) { Perhaps rename to hex_digit? Other static function names do not start with underscore. Let's be consistent. > + if ('0' <= c && c <= '9') > + return c - '0'; > + > + if ('A' <= c && c <= 'F') > + return c - 'A'; > + > + if ('a' <= c && c <= 'f') > + return c - 'a'; > + Does this really work as expected? 'B' - 'A' would be 1, while it seems that we expect 10. Perhaps we should use sscanf(3) instead? That may make the code simpler and allow to convert both chars at once. > + INTERNAL_ERROR ("Illegal hex digit %c", c); > + /*NOTREACHED*/ > + return 0; > +} > + > +char *hex_decode (void *ctx, const char *str) { > + > + int len = strlen(str); > + > + const char *p; > + char *q; > + char *newstr; If you decide to use "out" variable in hex_encode() instead of "newstr", please rename it here as well. > + int escapes = 0; > + > + for (p = str; *p; p++) > + escapes += (*p == HEX_ESCAPE_CHAR); > + > + newstr = talloc_size (ctx, len - escapes*2 + 1); > + > + p = str; > + q = newstr; > + > + while (*p) { > + > + if (*p == HEX_ESCAPE_CHAR) { > + > + if (len < 3) INTERNAL_ERROR ("Syntax error decoding %s", str); > + > + *q = _digit(p[1]) * 16; > + *q += _digit(p[2]); > + > + len -= 3; > + p += 3; > + q++; > + } else { > + *q++ = *p++; > + } > + } > + > + return newstr; > +} > diff --git a/util/hex-escape.h b/util/hex-escape.h > new file mode 100644 > index 0000000..7caff15 > --- /dev/null > +++ b/util/hex-escape.h > @@ -0,0 +1,10 @@ > +#ifndef _HEX_ESCAPE_H > +#define _HEX_ESCAPE_H > + > +#define HEX_ESCAPE_CHAR '%' > +#define HEX_NO_ESCAPE "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" \ > + "0123456789+-_@=.:," > + Can we make these proper constants? Are these macros supposed to be used elsewhere? If no, we should move them inside hex-escape.c and probably even make local to functions that need them. Regards, Dmitry > +char *hex_encode (void *talloc_ctx, const char *string); > +char *hex_decode (void *talloc_ctx, const char *hex); > +#endif > -- > 1.7.7.3 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch