From: Tomi Ollila <tomi.ollila@iki.fi>
To: John Lenz <lenz@math.uic.edu>, notmuch@notmuchmail.org
Subject: Re: cli: add --include-html option to notmuch show
In-Reply-To: <notmuch-web-1374719771.1588310986@www.wuzzeb.org>
References: <notmuch-web-1372724382.450184839@www.wuzzeb.org>
	<m27ggj1x29.fsf@guru.guru-group.fi>
	<notmuch-web-1374719771.1588310986@www.wuzzeb.org>
User-Agent: Notmuch/0.16+2~g0418bb2 (http://notmuchmail.org) Emacs/24.3.1
	(x86_64-unknown-linux-gnu)
Date: Sun, 04 Aug 2013 22:47:10 +0300
Message-ID: <m2zjsxs0g1.fsf@guru.guru-group.fi>
MIME-Version: 1.0
Content-Type: text/plain
Precedence: list

On Thu, Jul 25 2013, John Lenz <lenz@math.uic.edu> wrote:

> On Sun Jul 21 15:23 -0500 2013, Tomi Ollila <tomi.ollila@iki.fi> wrote:
>> On Tue, Jul 02 2013, John Lenz <lenz@math.uic.edu> wrote:
>> 
>> > For my client, the largest bottleneck for displaying large threads is
>> > exporting each html part individually since by default notmuch will not
>> > show the json parts.  For large threads there can be quite a few parts and
>> > each must be exported and decoded one by one.  Also, I then have to deal
>> > with all the crazy charsets which I can do through a library but is a
>> > pain.
>> 
>> This looks like a useful option. I just wonder what effect does different
>> charsets do to the output (is text/html content output verbatim (with just
>> json/sexp escaping of '"' -characters). 
>> 
>> If you added test(s) showing what happens with different charsets
>> (like one message having 3 text/html parts, one us-ascii, one iso-8859-1
>> and one utf-8) that would make things clearer and (also) protect us from 
>> regressions.
>> 

> Here is a test I wrote.  I tried to follow the other tests in formatting.
> Let me know if you want this as a single patch combined with the code
> to enable the option, I can resend it.

I took your patch, modified it a bit and put it at the end of 'multipart'
test. The diff for viewing is attached at the end.

The next question is should we have new option as

--include-html

or as

--include-html=(true|false)

or even

--body=(true|false|text-and-html)

See --exclude option in http://notmuchmail.org/manpages/notmuch-search-1/
and --body option in http://notmuchmail.org/manpages/notmuch-show-1/
for comparison...


Tomi

--8<----8<----8<----8<----8<--

diff --git a/test/multipart b/test/multipart
index c974226..11f10bd 100755
--- a/test/multipart
+++ b/test/multipart
@@ -647,4 +647,84 @@ notmuch show --format=raw --part=3 id:base64-part-with-crlf > crlf.out
 echo -n -e "\xEF\x0D\x0A" > crlf.expected
 test_expect_equal_file crlf.out crlf.expected
 
-test_done
\ No newline at end of file
+
+# The ISO-8859-1 encoding of U+00BD is a single byte: octal 275
+# (Portability note: Dollar-Single ($'...', ANSI C-style escape sequences)
+# quoting works on bash, ksh, zsh, *BSD sh but not on dash, ash nor busybox sh)
+readonly u_00bd_latin1=$'\275'
+
+# The Unicode fraction symbol 1/2 is U+00BD and is encoded
+# in UTF-8 as two bytes: octal 302 275
+readonly u_00bd_utf8=$'\302\275'
+
+cat <<EOF > ${MAIL_DIR}/include-html
+From: A <a@example.com>
+To: B <b@example.com>
+Subject: html message
+Date: Sat, 01 January 2000 00:00:00 +0000
+Message-ID: <htmlmessage>
+MIME-Version: 1.0
+Content-Type: multipart/alternative; boundary="==-=="
+
+--==-==
+Content-Type: text/html; charset=UTF-8
+
+<p>0.5 equals ${u_00bd_utf8}</p>
+
+--==-==
+Content-Type: text/html; charset=ISO-8859-1
+
+<p>0.5 equals ${u_00bd_latin1}</p>
+
+--==-==
+Content-Type: text/plain; charset=UTF-8
+
+0.5 equals ${u_00bd_utf8}
+
+--==-==--
+EOF
+
+notmuch new > /dev/null
+
+cat_expected_head ()
+{
+        cat <<EOF
+[[[{"id": "htmlmessage", "match":true, "excluded": false, "date_relative":"2000-01-01",
+   "timestamp": 946684800,
+   "filename": "${MAIL_DIR}/include-html",
+   "tags": ["inbox", "unread"],
+   "headers": { "Date": "Sat, 01 Jan 2000 00:00:00 +0000", "From": "A <a@example.com>",
+                "Subject": "html message", "To": "B <b@example.com>"},
+   "body": [{
+     "content-type": "multipart/alternative", "id": 1,
+EOF
+}
+
+cat_expected_head > EXPECTED.nohtml
+cat <<EOF >> EXPECTED.nohtml
+"content": [
+  { "id": 2, "content-charset": "UTF-8", "content-length": 21, "content-type": "text/html"},
+  { "id": 3, "content-charset": "ISO-8859-1", "content-length": 20, "content-type": "text/html"},
+  { "id": 4, "content-type": "text/plain", "content": "0.5 equals \\u00bd\\n"}
+]}]},[]]]]
+EOF
+
+# Both the UTF-8 and ISO-8859-1 part should have U+00BD
+cat_expected_head > EXPECTED.withhtml
+cat <<EOF >> EXPECTED.withhtml
+"content": [
+  { "id": 2, "content-type": "text/html", "content": "<p>0.5 equals \\u00bd</p>\\n"},
+  { "id": 3, "content-type": "text/html", "content": "<p>0.5 equals \\u00bd</p>\\n"},
+  { "id": 4, "content-type": "text/plain", "content": "0.5 equals \\u00bd\\n"}
+]}]},[]]]]
+EOF
+
+test_begin_subtest "html parts excluded by default"
+notmuch show --format=json id:htmlmessage > OUTPUT
+test_expect_equal_json "$(cat OUTPUT)" "$(cat EXPECTED.nohtml)"
+
+test_begin_subtest "html parts included"
+notmuch show --format=json --include-html id:htmlmessage > OUTPUT
+test_expect_equal_json "$(cat OUTPUT)" "$(cat EXPECTED.withhtml)"
+
+test_done