Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id AAD89431FAF for ; Sat, 4 Feb 2012 17:03:07 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -1.098 X-Spam-Level: X-Spam-Status: No, score=-1.098 tagged_above=-999 required=5 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001, NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cY34u9ZF5Svz for ; Sat, 4 Feb 2012 17:03:06 -0800 (PST) Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 7FD3E431FAE for ; Sat, 4 Feb 2012 17:03:06 -0800 (PST) Received: from smtp.qmul.ac.uk ([138.37.6.40]) by mail2.qmul.ac.uk with esmtp (Exim 4.71) (envelope-from ) id 1RtqVT-0004i3-Kf; Sun, 05 Feb 2012 01:03:04 +0000 Received: from 94-192-233-223.zone6.bethere.co.uk ([94.192.233.223] helo=localhost) by smtp.qmul.ac.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1RtqVS-0001AM-Tw; Sun, 05 Feb 2012 01:03:03 +0000 From: Mark Walters To: David Bremner , notmuch@notmuchmail.org Subject: Re: [PATCH v3 09/10] random-dump.c: new test-binary to generate dump files In-Reply-To: <1326591624-15493-10-git-send-email-david@tethera.net> References: <874nwxbkhr.fsf@zancas.localnet> <1326591624-15493-1-git-send-email-david@tethera.net> <1326591624-15493-10-git-send-email-david@tethera.net> User-Agent: Notmuch/0.11+154~ged6d37e (http://notmuchmail.org) Emacs/23.2.1 (i486-pc-linux-gnu) Date: Sun, 05 Feb 2012 01:04:13 +0000 Message-ID: <87pqduozua.fsf@qmul.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Sender-Host-Address: 94.192.233.223 X-QM-SPAM-Info: Sender has good ham record. :) X-QM-Body-MD5: 3d9f4a4d652b1de39be6b3fa8d4da618 (of first 20000 bytes) X-SpamAssassin-Score: -1.8 X-SpamAssassin-SpamBar: - X-SpamAssassin-Report: The QM spam filters have analysed this message to determine if it is spam. We require at least 5.0 points to mark a message as spam. This message scored -1.8 points. Summary of the scoring: * -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, * medium trust * [138.37.6.40 listed in list.dnswl.org] * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider * (markwalters1009[at]gmail.com) * -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay * domain * 0.5 AWL AWL: From: address is in the auto white-list X-QM-Scan-Virus: ClamAV says the message is clean Cc: David Bremner X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Feb 2012 01:03:07 -0000 On Sat, 14 Jan 2012 21:40:23 -0400, David Bremner wrote: > From: David Bremner > > This binary creates a "torture test" dump file for the new dump > format. > --- > test/Makefile.local | 4 ++ > test/basic | 2 +- > test/random-dump.c | 144 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 149 insertions(+), 1 deletions(-) > create mode 100644 test/random-dump.c > > diff --git a/test/Makefile.local b/test/Makefile.local > index ba697f4..b59f837 100644 > --- a/test/Makefile.local > +++ b/test/Makefile.local > @@ -16,6 +16,9 @@ $(dir)/arg-test: $(dir)/arg-test.o command-line-arguments.o util/libutil.a > $(dir)/hex-xcode: $(dir)/hex-xcode.o command-line-arguments.o util/libutil.a > $(call quiet,CC) -I. $^ -o $@ -ltalloc > > +$(dir)/random-dump: $(dir)/random-dump.o command-line-arguments.o util/libutil.a > + $(call quiet,CC) -I. $^ -o $@ -ltalloc -lm > + > $(dir)/smtp-dummy: $(smtp_dummy_modules) > $(call quiet,CC) $^ -o $@ > > @@ -25,6 +28,7 @@ $(dir)/symbol-test: $(dir)/symbol-test.o > .PHONY: test check > > test-binaries: $(dir)/arg-test $(dir)/hex-xcode \ > + $(dir)/random-dump \ > $(dir)/smtp-dummy $(dir)/symbol-test > > test: all test-binaries > diff --git a/test/basic b/test/basic > index af57026..e3a6cef 100755 > --- a/test/basic > +++ b/test/basic > @@ -54,7 +54,7 @@ test_begin_subtest 'Ensure that all available tests will be run by notmuch-test' > eval $(sed -n -e '/^TESTS="$/,/^"$/p' $TEST_DIRECTORY/notmuch-test) > tests_in_suite=$(for i in $TESTS; do echo $i; done | sort) > available=$(find "$TEST_DIRECTORY" -maxdepth 1 -type f -executable -printf '%f\n' | \ > - sed -r -e "/^(aggregate-results.sh|notmuch-test|smtp-dummy|test-verbose|symbol-test|arg-test|hex-xcode)$/d" | \ > + sed -r -e "/^(aggregate-results.sh|notmuch-test|smtp-dummy|test-verbose|symbol-test|arg-test|hex-xcode|random-dump)$/d" | \ > sort) > test_expect_equal "$tests_in_suite" "$available" > > diff --git a/test/random-dump.c b/test/random-dump.c > new file mode 100644 > index 0000000..1949425 > --- /dev/null > +++ b/test/random-dump.c > @@ -0,0 +1,144 @@ > +/* > + Generate a random dump file in 'notmuch' format. > + Generated message-id's and tags are intentionally nasty. > + > + We restrict ourselves to 7 bit message-ids, because generating > + random valid UTF-8 seems like work. And invalid UTF-8 can't be > + round-tripped via Xapian. > + > + */ > + > +#include > +#include > +#include > +#include > +#include "math.h" > +#include "hex-escape.h" > +#include "command-line-arguments.h" > + > +static void > +hex_out (void *ctx, char *buf) > +{ > + static char *encoded_buf = NULL; > + static size_t encoded_buf_size = 0; > + > + if (hex_encode (ctx, buf, &encoded_buf, &encoded_buf_size) != HEX_SUCCESS) { > + fprintf (stderr, "Hex encoding failed"); > + exit (1); > + } > + > + fputs (encoded_buf, stdout); > +} > + > +static void > +random_chars (char *buf, int from, int stop, int max_char, > + const char *blacklist) > +{ > + int i; > + > + for (i = from; i < stop; i++) { > + do { > + buf[i] = ' ' + (random () % (max_char - ' ')); > + } while (blacklist && strchr (blacklist, buf[i])); > + } > +} > + > +static void > +random_tag (void *ctx, size_t len) > +{ > + static char *buf = NULL; > + static size_t buf_len = 0; > + > + int use = (random () % (len - 1)) + 1; > + > + if (len > buf_len) { > + buf = talloc_realloc (ctx, buf, char, len); > + buf_len = len; > + } > + > + random_chars (buf, 0, use, 255, NULL); > + > + buf[use] = '\0'; > + > + hex_out (ctx, buf); > +} > + > +static void > +random_message_id (void *ctx, size_t len) > +{ > + static char *buf = NULL; > + static size_t buf_len = 0; > + > + int lhs_len = (random () % (len / 2 - 1)) + 1; > + > + int rhs_len = (random () % len / 2) + 1; > + > + const char *blacklist = "\n\r@<>[]()"; > + > + if (len > buf_len) { > + buf = talloc_realloc (ctx, buf, char, len); > + buf_len = len; > + } > + > + random_chars (buf, 0, lhs_len, 127, blacklist); > + > + buf[lhs_len] = '@'; > + > + random_chars (buf, lhs_len + 1, lhs_len + rhs_len + 1, 127, blacklist); > + > + hex_out (ctx, buf); > +} > + > +int > +main (int argc, char **argv) > +{ > + > + void *ctx = talloc_new (NULL); > + int num_lines = 500; > + int max_tags = 10; > + int message_id_len = 100; > + int tag_len = 50; > + int seed = 734569; > + > + int pad_tag = 0, pad_mid = 0; > + > + notmuch_opt_desc_t options[] = { > + { NOTMUCH_OPT_INT, &num_lines, "num-lines", 'n', 0 }, > + { NOTMUCH_OPT_INT, &max_tags, "max-tags", 'm', 0 }, > + { NOTMUCH_OPT_INT, &message_id_len, "message-id-len", 'M', 0 }, > + { NOTMUCH_OPT_INT, &tag_len, "tag-len", 't', 0 }, > + { NOTMUCH_OPT_INT, &seed, "tag-len", 't', 0 }, > + { 0, 0, 0, 0, 0 } > + }; > + > + int opt_index = parse_arguments (argc, argv, options, 1); > + > + if (opt_index < 0) > + exit (1); > + > + pad_mid = ((int) log10 (num_lines) + 1); > + pad_tag = ((int) log10 (max_tags)) + 1; > + > + srandom (seed); > + > + int line; > + for (line = 0; line < num_lines; line++) { > + > + printf ("%0*d-", pad_mid, line); > + > + random_message_id (ctx, message_id_len); > + > + int num_tags = random () % (max_tags + 1); > + > + int j; > + for (j = 0; j < num_tags; j++) { > + printf (" %0*d-", pad_tag, j); > + random_tag (ctx, tag_len); > + } > + putchar ('\n'); > + } > + > + talloc_free (ctx); > + > + return 0; > +} Hi Just a thought on this and the next test. Could you add messages with the random ids and tags from the above code to the Xapian database directly by calling whatever notmuch-new calls. Then test by doing dump, restore and dump and check the two dumps are equal? It might avoid your gmime concern from the next patch and you could have arbitrary (non-null) strings including all sorts of malformed utf-8. I guess Xapian might do bizarre things on the malformed utf-8 but, if it does, it might mean the correct place to fix it is in notmuch-new. Best wishes Mark