From a9f8c7569b783cc85c9f770f87c86855623ff604 Mon Sep 17 00:00:00 2001 From: David Bremner Date: Tue, 15 Mar 2016 08:49:36 +2100 Subject: [PATCH] Re: [PATCH v1 0/3] Improve the acquisition of text parts. --- 1f/6a19e6b5eab6319e0931a63aa4286bc82ba4a0 | 103 ++++++++++++++++++++++ 1 file changed, 103 insertions(+) create mode 100644 1f/6a19e6b5eab6319e0931a63aa4286bc82ba4a0 diff --git a/1f/6a19e6b5eab6319e0931a63aa4286bc82ba4a0 b/1f/6a19e6b5eab6319e0931a63aa4286bc82ba4a0 new file mode 100644 index 000000000..fdea3c686 --- /dev/null +++ b/1f/6a19e6b5eab6319e0931a63aa4286bc82ba4a0 @@ -0,0 +1,103 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by arlo.cworth.org (Postfix) with ESMTP id 619BE6DE1868 + for ; Mon, 14 Mar 2016 04:49:44 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at cworth.org +X-Spam-Flag: NO +X-Spam-Score: -0.031 +X-Spam-Level: +X-Spam-Status: No, score=-0.031 tagged_above=-999 required=5 + tests=[AWL=-0.020, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] + autolearn=disabled +Received: from arlo.cworth.org ([127.0.0.1]) + by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id x571wJW0eGWI for ; + Mon, 14 Mar 2016 04:49:42 -0700 (PDT) +Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) + by arlo.cworth.org (Postfix) with ESMTPS id 3D1C46DE1862 + for ; Mon, 14 Mar 2016 04:49:42 -0700 (PDT) +Received: from remotemail by fethera.tethera.net with local (Exim 4.84) + (envelope-from ) + id 1afR0p-0005LI-JP; Mon, 14 Mar 2016 07:50:15 -0400 +Received: (nullmailer pid 12910 invoked by uid 1000); + Mon, 14 Mar 2016 11:49:36 -0000 +From: David Bremner +To: David Edmondson , Mark Walters , + notmuch@notmuchmail.org +Subject: Re: [PATCH v1 0/3] Improve the acquisition of text parts. +In-Reply-To: +References: <1457457179-4707-1-git-send-email-dme@dme.org> + <87ziu2s8rb.fsf@qmul.ac.uk> +User-Agent: Notmuch/0.21+74~g6c60fb1 (http://notmuchmail.org) Emacs/24.5.1 + (x86_64-pc-linux-gnu) +Date: Mon, 14 Mar 2016 08:49:36 -0300 +Message-ID: <87bn6h5lf3.fsf@zancas.localnet> +MIME-Version: 1.0 +Content-Type: text/plain +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.20 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Mon, 14 Mar 2016 11:49:44 -0000 + +David Edmondson writes: + +> On Sun, Mar 13 2016, Mark Walters wrote: +>> However, it would be sensible to get testing in a greater variety of +>> charsets/encodings +> +> Agreed. Does anyone have suggestions on how we might achieve this? A +> corpus of mail that we could use? + +Maybe the notmuch performance corpus, particularly the lkml sample. + +grep -R charset= performance-test/corpus/mail/lkml | sed -e 's/^.*charset=//' -e 's/;.*//' -e 's/"//g' | tr '[A-Z]' '[a-z]' | sort -u + +gives + +euc-kr +gb2312 +iso-2022-jp +iso-2022-jp-2 +iso-8859-1 +iso-8859-14 +iso 8859-15 +iso-8859-15 +iso-8859-1 +iso-8859-2 +iso-8859-6 +iso-8859-7 +iso-8859-9 +koi8-r +koi8-u +ks_c_5601-1987 +shift_jis +unknown +unknown-8bit +us-ascii +utf8 +utf-8 +windows-1250 +windows-1251 +windows-1252 +windows-1255 + + +to unpack the corpus + +cd performance-test +make download-corpus +./T00-new.sh --large + +probably interrupt the test once notmuch-new starts running. + -- 2.26.2