Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 9E965431FAF for ; Sat, 8 Sep 2012 06:38:29 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Kj6MYN+0UJ1F for ; Sat, 8 Sep 2012 06:38:28 -0700 (PDT) Received: from guru.guru-group.fi (guru.guru-group.fi [46.183.73.34]) by olra.theworths.org (Postfix) with ESMTP id 4D08B431FAE for ; Sat, 8 Sep 2012 06:38:28 -0700 (PDT) Received: from guru.guru-group.fi (localhost [IPv6:::1]) by guru.guru-group.fi (Postfix) with ESMTP id 656061000E5; Sat, 8 Sep 2012 16:38:35 +0300 (EEST) From: Tomi Ollila To: david@tethera.net, notmuch@notmuchmail.org Subject: Re: [Patch v3 5/6] test: add generator for random "stub" messages In-Reply-To: <1345382314-5330-6-git-send-email-david@tethera.net> References: <1345382314-5330-1-git-send-email-david@tethera.net> <1345382314-5330-6-git-send-email-david@tethera.net> User-Agent: Notmuch/0.14+11~gd9bf007 (http://notmuchmail.org) Emacs/24.2.1 (x86_64-unknown-linux-gnu) X-Face: HhBM'cA~ MIME-Version: 1.0 Content-Type: text/plain Cc: David Bremner X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Sep 2012 13:38:29 -0000 On Sun, Aug 19 2012, david@tethera.net wrote: > From: David Bremner > > Initial use case is testing dump and restore, so we only have > message-ids and tags. > > The message ID's are nothing like RFC compliant, but it doesn't seem > any harder to roundtrip random UTF-8 strings than RFC-compliant ones. > > Tags are UTF-8, even though notmuch is in principle more generous than > that. > --- Mostly LGTM (the whole series). Few comments inline... Finally, 6/6 adds known broken test -- when will we see this code taken into use and the broken test fixed :) > test/.gitignore | 1 + > test/Makefile.local | 9 +++ > test/basic | 2 +- > test/random-corpus.c | 202 ++++++++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 213 insertions(+), 1 deletion(-) > create mode 100644 test/random-corpus.c [ ... ] > > diff --git a/test/random-corpus.c b/test/random-corpus.c > new file mode 100644 > index 0000000..8c5b559 > --- /dev/null > +++ b/test/random-corpus.c [ ... ] > + > +/* Current largest UTF-32 value defined. Note that most of these will > + * be printed as boxes in most fonts. > + */ Should we be talking about UTF-8 valies. UTF-8 (currently has the same limit). > + > +#define GLYPH_MAX 0x10FFFE > + > +static gunichar > +random_unichar () > +{ > + int start = 1, stop = GLYPH_MAX; > + int class = random() % 2; > + > + /* > + * Choose about half ascii as test characters, as ascii > + * punctation and whitespace is the main cause of problems for > + * the (old) restore parser > + */ > + switch (class) { > + case 0: > + /* ascii */ > + start = 0x01; > + stop = 0x7f; > + break; > + case 1: > + /* the rest of unicode */ > + start = 0x80; > + stop = GLYPH_MAX; > + } > + > + if (start == stop) > + return start; > + else > + return start + (random() % (stop - start + 1)); > +} > + > +static char * > +random_utf8_string (void *ctx, size_t char_count) > +{ > + > + gchar *buf = NULL; > + size_t buf_size = 0; > + > + size_t offset = 0; > + > + size_t i; > + > + buf = talloc_realloc (ctx, NULL, gchar, char_count); > + buf_size = char_count; > + > + for (i = 0; i < char_count; i++) { > + gunichar randomchar; > + size_t written; > + > + /* 6 for one glyph, one for null */ > + if (buf_size - offset < 8) { > + buf_size += 16; > + buf = talloc_realloc (ctx, buf, gchar, buf_size); This reallocation will hit many times, as originally there was just char_count bytes allocated -- this limit will probably get hit before halfway the creation of random string (half uses 1 byte, other half 2, 3 or 4 bytes, mostly 4 (even only half of the 4-byte range is used...) Maybe originally allocating char_count * 2 + 8 and if realloc required (char_count - i) * 2 + 8... or maybe better, just doing the latter realloc and replacing first with buf = NULL; buf_size = 0; Alternatively you could play with random states; calculate size, reset random state, alloc size + 1 and write chars. > + } > + > + randomchar = random_unichar(); > + > + written = g_unichar_to_utf8 (randomchar, buf + offset); > + > + if (written <= 0) { > + fprintf (stderr, "error converting to utf8\n"); > + exit (1); > + } > + > + offset += written; > + > + } Above there is extra newline. There are a few others in other files (at least after opening and before closing brace). Maybe uncrustify your source :) > + buf[offset] = 0; > + return buf; > +} > + [ ... ] Tomi