Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id AB08D431FC2 for ; Sun, 5 Aug 2012 11:13:38 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sb56ryXkDh2J for ; Sun, 5 Aug 2012 11:13:38 -0700 (PDT) Received: from tesseract.cs.unb.ca (tesseract.cs.unb.ca [131.202.240.238]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id E1111431FAE for ; Sun, 5 Aug 2012 11:13:37 -0700 (PDT) Received: from fctnnbsc30w-156034089108.dhcp-dynamic.fibreop.nb.bellaliant.net ([156.34.89.108] helo=zancas.localnet) by tesseract.cs.unb.ca with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1Sy5KU-0006XJ-O4 for notmuch@notmuchmail.org; Sun, 05 Aug 2012 15:13:35 -0300 Received: from bremner by zancas.localnet with local (Exim 4.80) (envelope-from ) id 1Sy5KP-0005ww-CK for notmuch@notmuchmail.org; Sun, 05 Aug 2012 15:13:25 -0300 From: david@tethera.net To: notmuch@notmuchmail.org Subject: test infrastructure for new dump/restore Date: Sun, 5 Aug 2012 15:13:10 -0300 Message-Id: <1344190393-22497-1-git-send-email-david@tethera.net> X-Mailer: git-send-email 1.7.10.4 X-Spam_bar: - X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Aug 2012 18:13:39 -0000 This implements an old suggestion of Mark's to get wacky message-ids into the database directly without relying on underdefined behaviour of the gmime parser; the previous effort relied on gmime passing literally through message-ids not delimitted according RFC. Also compared to the previous version (id:"1326591624-15493-10-git-send-email-david@tethera.net"), this now uses valid UTF-8 text, rather than just ascii, although it is a bit biased towards ascii because most of the characters that cause problems are there. There is a fair amount of code here, but I hope the generation of random messages may be more useful in the future. The high level goal here is to (re)-introduce a hex-encoding based dump-restore that can pass this roundtrip test, and probably some batch tagging facility that shares code. If people don't mind things broken up into mini-series (without obvious gain in new features) like this, I'll probably post the hex-encoding infrastructure next.