1 Return-Path: <markwalters1009@gmail.com>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by arlo.cworth.org (Postfix) with ESMTP id C24066DE00E8
\r
6 for <notmuch@notmuchmail.org>; Sat, 26 Mar 2016 02:18:34 -0700 (PDT)
\r
7 X-Virus-Scanned: Debian amavisd-new at cworth.org
\r
11 X-Spam-Status: No, score=-0.305 tagged_above=-999 required=5 tests=[AWL=0.265,
\r
12 DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
\r
13 FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7,
\r
14 RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001]
\r
16 Received: from arlo.cworth.org ([127.0.0.1])
\r
17 by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024)
\r
18 with ESMTP id laOczQxHPryt for <notmuch@notmuchmail.org>;
\r
19 Sat, 26 Mar 2016 02:18:25 -0700 (PDT)
\r
20 Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com
\r
21 [74.125.82.68]) by arlo.cworth.org (Postfix) with ESMTPS id 11FB26DE0005 for
\r
22 <notmuch@notmuchmail.org>; Sat, 26 Mar 2016 02:18:25 -0700 (PDT)
\r
23 Received: by mail-wm0-f68.google.com with SMTP id p65so8739050wmp.1
\r
24 for <notmuch@notmuchmail.org>; Sat, 26 Mar 2016 02:18:24 -0700 (PDT)
\r
25 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
\r
26 h=from:to:subject:in-reply-to:references:user-agent:date:message-id
\r
27 :mime-version; bh=OV6tNAA4giJd6MVRdulmNsrpFaqdi4WXbDR8yaUb1Q8=;
\r
28 b=HubBKQDHgna15x9mb1zdpGiJqHHv8EdV+n/xryMJZ0R/K3BZpdi1fdtIFT6NQeEl8c
\r
29 LVtxCdhdZLP9eHJ1+pkjSvqmebYl9dCVi4Tj+aD0nUyVSuipF3Nlcp14o7Ji2vInn9/R
\r
30 oMbFTQMduxzaSlcOxAAhcFRGfWaI58Jeg/mgU0skmN52Y36y29h7uTHeeKL8QinHj0SK
\r
31 rJy9eSY2fnFDD7okmtf2PLJmaEe4WSYKPc8l6aH+Xe9JPhJQNIfzM/E/2CtyIaLTeezc
\r
32 LBbRj1d/ODVaKNZWvH42dko8TWu8vpjvTkWzaRfPTUcnMtCcxuBiiwRNgLklReX7NpB5
\r
34 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
\r
35 d=1e100.net; s=20130820;
\r
36 h=x-gm-message-state:from:to:subject:in-reply-to:references
\r
37 :user-agent:date:message-id:mime-version;
\r
38 bh=OV6tNAA4giJd6MVRdulmNsrpFaqdi4WXbDR8yaUb1Q8=;
\r
39 b=Ee1k6ZG9rr9WJ0lSCVGmIHPrernLuDV/VeyC4vvXzSsM+yj65giMIdtdq/NL8/teCh
\r
40 4FeX4r+CqXfSfOIRYFhh/U8rCQML55+84MQRtdyQ2rxMbey6YxhE3WIxQGJ4iyQHNtha
\r
41 a6UszeRy8UltVOUvSWsNIZf0Fihr11AT7YAYraAtopQ2TYMlGzL1Bx8NfVRUKn5RsHWP
\r
42 /GQRpMZoFFusxUlY+cHMdnKNwpmPzNiyPWJGYEq3nPBHEnipFaOPbr+bn9sC7rKlBLsU
\r
43 UlpVoDIGqllFsth56SctJimrR2cJVvzbNi+NZmlYUZHOMySwk1COH1R6pCMzwj3WyaBk
\r
46 AD7BkJIJbcEXnjnzGf/oYhd26eqS5kZRgOB2dcdI58oKDDjlkh26D8g5Bga8QMhGzrjxCA==
\r
47 X-Received: by 10.28.136.19 with SMTP id k19mr945289wmd.11.1458983903175;
\r
48 Sat, 26 Mar 2016 02:18:23 -0700 (PDT)
\r
49 Received: from localhost (5751dfa2.skybroadband.com. [87.81.223.162])
\r
50 by smtp.gmail.com with ESMTPSA id u14sm841793wmu.8.2016.03.26.02.18.21
\r
51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
\r
52 Sat, 26 Mar 2016 02:18:22 -0700 (PDT)
\r
53 From: Mark Walters <markwalters1009@gmail.com>
\r
54 To: David Bremner <david@tethera.net>, David Edmondson <dme@dme.org>,
\r
55 notmuch@notmuchmail.org
\r
56 Subject: Re: [PATCH v1 0/3] Improve the acquisition of text parts.
\r
57 In-Reply-To: <87bn6h5lf3.fsf@zancas.localnet>
\r
58 References: <1457457179-4707-1-git-send-email-dme@dme.org>
\r
59 <87ziu2s8rb.fsf@qmul.ac.uk> <m2pouxqx3e.fsf@dme.org>
\r
60 <87bn6h5lf3.fsf@zancas.localnet>
\r
61 User-Agent: Notmuch/0.21+69~gd27d908 (http://notmuchmail.org) Emacs/24.4.1
\r
62 (x86_64-pc-linux-gnu)
\r
63 Date: Sat, 26 Mar 2016 09:18:20 +0000
\r
64 Message-ID: <87zitlppgj.fsf@qmul.ac.uk>
\r
66 Content-Type: text/plain
\r
67 X-BeenThere: notmuch@notmuchmail.org
\r
68 X-Mailman-Version: 2.1.20
\r
70 List-Id: "Use and development of the notmuch mail system."
\r
71 <notmuch.notmuchmail.org>
\r
72 List-Unsubscribe: <https://notmuchmail.org/mailman/options/notmuch>,
\r
73 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
74 List-Archive: <http://notmuchmail.org/pipermail/notmuch/>
\r
75 List-Post: <mailto:notmuch@notmuchmail.org>
\r
76 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
77 List-Subscribe: <https://notmuchmail.org/mailman/listinfo/notmuch>,
\r
78 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
79 X-List-Received-Date: Sat, 26 Mar 2016 09:18:34 -0000
\r
84 Sorry this email ended up rather long:
\r
86 Summary: I have run a test (see below) on all of the lkml part of the
\r
87 performance-corpus, and all the changes look expected. So this series
\r
90 First note how we do the bodypart-insertion: for a mime type of
\r
91 text/plain we first try the text/plain handler, then a text/* handler,
\r
92 and finally a */* handler until one succeeds. Before this series, when
\r
93 the part is application/octet-stream but is detected as text/plain,
\r
94 text/plain handler fails with a "bodypart insertion error" because
\r
95 notmuch-get-bodypart-text fails can't get the text (because it's not
\r
96 officially text). Thus we fall back on the */* handler and that inserts
\r
99 With this series notmuch-get-bodypart-text succeeds and we stop.
\r
101 Thus in most cases the only change is that we don't get a "bodypart
\r
102 insertion error", but all the text looks the same. In a couple of cases
\r
103 the text/plain handler wraps lines/replaces ^M by unix newlines, whereas
\r
104 as the */* handler does not. This is an improvement.
\r
106 There is one more "difference" but I think this is actually something
\r
107 random. Sometimes when the part is application/tar or application/zip I
\r
108 get "Bodypart insert error: Symbol's function definition is void:
\r
109 gnus-recursive-directory-files". If I load gnus this goes away. In my
\r
110 first batch of tests this only occurred when using this series, but
\r
111 since then I have reproduced it on mainline. I think something else I
\r
112 did when setting up the test on mainline caused gnus to be loaded, but i
\r
113 have not worked out what is going on there.
\r
115 Finally, the test was as follows. I downloaded the performance corpus,
\r
116 configured a separate notmuch config file to use the
\r
117 performance-test/corpus/mail/lkml as the mailstore, went into
\r
118 notmuch-emacs and to the inbox (which contained all messages) and ran
\r
119 the following lisp function
\r
122 (defun my-save-all-show ()
\r
124 (goto-char (point-min))
\r
126 (while (notmuch-search-find-thread-id)
\r
127 (let ((thread-id (notmuch-search-find-thread-id)))
\r
128 (setq count (1+ count))
\r
129 (message "Thread %s: %s" count thread-id)
\r
130 (notmuch-show thread-id)
\r
131 (let ((text (buffer-string))
\r
132 (coding-system-for-write 'no-conversion))
\r
133 (with-temp-file (concat "OUTPUT-" thread-id) (insert text)))
\r
135 (notmuch-search-next-thread))))
\r
137 I moved the OUTPUT files elsewhere and repeated with this series applied
\r
138 and then ran diff on the output. This gave 7 threads with a change (each
\r
139 an individual message) from the 16000 threads/ 100000 messages which I
\r
140 looked at individually as above.
\r
151 On Mon, 14 Mar 2016, David Bremner <david@tethera.net> wrote:
\r
152 > David Edmondson <dme@dme.org> writes:
\r
154 >> On Sun, Mar 13 2016, Mark Walters wrote:
\r
155 >>> However, it would be sensible to get testing in a greater variety of
\r
156 >>> charsets/encodings
\r
158 >> Agreed. Does anyone have suggestions on how we might achieve this? A
\r
159 >> corpus of mail that we could use?
\r
161 > Maybe the notmuch performance corpus, particularly the lkml sample.
\r
163 > grep -R charset= performance-test/corpus/mail/lkml | sed -e 's/^.*charset=//' -e 's/;.*//' -e 's/"//g' | tr '[A-Z]' '[a-z]' | sort -u
\r
195 > to unpack the corpus
\r
197 > cd performance-test
\r
198 > make download-corpus
\r
199 > ./T00-new.sh --large
\r
201 > probably interrupt the test once notmuch-new starts running.
\r