--- /dev/null
+Return-Path: <david@tethera.net>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by arlo.cworth.org (Postfix) with ESMTP id 619BE6DE1868\r
+ for <notmuch@notmuchmail.org>; Mon, 14 Mar 2016 04:49:44 -0700 (PDT)\r
+X-Virus-Scanned: Debian amavisd-new at cworth.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.031\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.031 tagged_above=-999 required=5\r
+ tests=[AWL=-0.020, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01]\r
+ autolearn=disabled\r
+Received: from arlo.cworth.org ([127.0.0.1])\r
+ by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id x571wJW0eGWI for <notmuch@notmuchmail.org>;\r
+ Mon, 14 Mar 2016 04:49:42 -0700 (PDT)\r
+Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197])\r
+ by arlo.cworth.org (Postfix) with ESMTPS id 3D1C46DE1862\r
+ for <notmuch@notmuchmail.org>; Mon, 14 Mar 2016 04:49:42 -0700 (PDT)\r
+Received: from remotemail by fethera.tethera.net with local (Exim 4.84)\r
+ (envelope-from <david@tethera.net>)\r
+ id 1afR0p-0005LI-JP; Mon, 14 Mar 2016 07:50:15 -0400\r
+Received: (nullmailer pid 12910 invoked by uid 1000);\r
+ Mon, 14 Mar 2016 11:49:36 -0000\r
+From: David Bremner <david@tethera.net>\r
+To: David Edmondson <dme@dme.org>, Mark Walters <markwalters1009@gmail.com>,\r
+ notmuch@notmuchmail.org\r
+Subject: Re: [PATCH v1 0/3] Improve the acquisition of text parts.\r
+In-Reply-To: <m2pouxqx3e.fsf@dme.org>\r
+References: <1457457179-4707-1-git-send-email-dme@dme.org>\r
+ <87ziu2s8rb.fsf@qmul.ac.uk> <m2pouxqx3e.fsf@dme.org>\r
+User-Agent: Notmuch/0.21+74~g6c60fb1 (http://notmuchmail.org) Emacs/24.5.1\r
+ (x86_64-pc-linux-gnu)\r
+Date: Mon, 14 Mar 2016 08:49:36 -0300\r
+Message-ID: <87bn6h5lf3.fsf@zancas.localnet>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.20\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <https://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch/>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <https://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Mon, 14 Mar 2016 11:49:44 -0000\r
+\r
+David Edmondson <dme@dme.org> writes:\r
+\r
+> On Sun, Mar 13 2016, Mark Walters wrote:\r
+>> However, it would be sensible to get testing in a greater variety of\r
+>> charsets/encodings\r
+>\r
+> Agreed. Does anyone have suggestions on how we might achieve this? A\r
+> corpus of mail that we could use?\r
+\r
+Maybe the notmuch performance corpus, particularly the lkml sample.\r
+\r
+grep -R charset= performance-test/corpus/mail/lkml | sed -e 's/^.*charset=//' -e 's/;.*//' -e 's/"//g' | tr '[A-Z]' '[a-z]' | sort -u\r
+\r
+gives\r
+\r
+euc-kr\r
+gb2312\r
+iso-2022-jp\r
+iso-2022-jp-2\r
+iso-8859-1\r
+iso-8859-14\r
+iso 8859-15\r
+iso-8859-15\r
+iso-8859-1\r
+iso-8859-2\r
+iso-8859-6\r
+iso-8859-7\r
+iso-8859-9\r
+koi8-r\r
+koi8-u\r
+ks_c_5601-1987\r
+shift_jis\r
+unknown\r
+unknown-8bit\r
+us-ascii\r
+utf8\r
+utf-8\r
+windows-1250\r
+windows-1251\r
+windows-1252\r
+windows-1255\r
+\r
+\r
+to unpack the corpus\r
+\r
+cd performance-test\r
+make download-corpus\r
+./T00-new.sh --large\r
+\r
+probably interrupt the test once notmuch-new starts running.\r
+\r