Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 23442431FB6 for ; Sun, 4 Aug 2013 12:47:29 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id I8e6iQqAscYn for ; Sun, 4 Aug 2013 12:47:19 -0700 (PDT) Received: from guru.guru-group.fi (guru.guru-group.fi [46.183.73.34]) by olra.theworths.org (Postfix) with ESMTP id 9A478431FAE for ; Sun, 4 Aug 2013 12:47:18 -0700 (PDT) Received: from guru.guru-group.fi (localhost [IPv6:::1]) by guru.guru-group.fi (Postfix) with ESMTP id E8A551000B2; Sun, 4 Aug 2013 22:47:10 +0300 (EEST) From: Tomi Ollila To: John Lenz , notmuch@notmuchmail.org Subject: Re: cli: add --include-html option to notmuch show In-Reply-To: References: User-Agent: Notmuch/0.16+2~g0418bb2 (http://notmuchmail.org) Emacs/24.3.1 (x86_64-unknown-linux-gnu) X-Face: HhBM'cA~ MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Aug 2013 19:47:29 -0000 On Thu, Jul 25 2013, John Lenz wrote: > On Sun Jul 21 15:23 -0500 2013, Tomi Ollila wrote: >> On Tue, Jul 02 2013, John Lenz wrote: >> >> > For my client, the largest bottleneck for displaying large threads is >> > exporting each html part individually since by default notmuch will not >> > show the json parts. For large threads there can be quite a few parts and >> > each must be exported and decoded one by one. Also, I then have to deal >> > with all the crazy charsets which I can do through a library but is a >> > pain. >> >> This looks like a useful option. I just wonder what effect does different >> charsets do to the output (is text/html content output verbatim (with just >> json/sexp escaping of '"' -characters). >> >> If you added test(s) showing what happens with different charsets >> (like one message having 3 text/html parts, one us-ascii, one iso-8859-1 >> and one utf-8) that would make things clearer and (also) protect us from >> regressions. >> > Here is a test I wrote. I tried to follow the other tests in formatting. > Let me know if you want this as a single patch combined with the code > to enable the option, I can resend it. I took your patch, modified it a bit and put it at the end of 'multipart' test. The diff for viewing is attached at the end. The next question is should we have new option as --include-html or as --include-html=(true|false) or even --body=(true|false|text-and-html) See --exclude option in http://notmuchmail.org/manpages/notmuch-search-1/ and --body option in http://notmuchmail.org/manpages/notmuch-show-1/ for comparison... Tomi --8<----8<----8<----8<----8<-- diff --git a/test/multipart b/test/multipart index c974226..11f10bd 100755 --- a/test/multipart +++ b/test/multipart @@ -647,4 +647,84 @@ notmuch show --format=raw --part=3 id:base64-part-with-crlf > crlf.out echo -n -e "\xEF\x0D\x0A" > crlf.expected test_expect_equal_file crlf.out crlf.expected -test_done \ No newline at end of file + +# The ISO-8859-1 encoding of U+00BD is a single byte: octal 275 +# (Portability note: Dollar-Single ($'...', ANSI C-style escape sequences) +# quoting works on bash, ksh, zsh, *BSD sh but not on dash, ash nor busybox sh) +readonly u_00bd_latin1=$'\275' + +# The Unicode fraction symbol 1/2 is U+00BD and is encoded +# in UTF-8 as two bytes: octal 302 275 +readonly u_00bd_utf8=$'\302\275' + +cat < ${MAIL_DIR}/include-html +From: A +To: B +Subject: html message +Date: Sat, 01 January 2000 00:00:00 +0000 +Message-ID: +MIME-Version: 1.0 +Content-Type: multipart/alternative; boundary="==-==" + +--==-== +Content-Type: text/html; charset=UTF-8 + +

0.5 equals ${u_00bd_utf8}

+ +--==-== +Content-Type: text/html; charset=ISO-8859-1 + +

0.5 equals ${u_00bd_latin1}

+ +--==-== +Content-Type: text/plain; charset=UTF-8 + +0.5 equals ${u_00bd_utf8} + +--==-==-- +EOF + +notmuch new > /dev/null + +cat_expected_head () +{ + cat <", + "Subject": "html message", "To": "B "}, + "body": [{ + "content-type": "multipart/alternative", "id": 1, +EOF +} + +cat_expected_head > EXPECTED.nohtml +cat <> EXPECTED.nohtml +"content": [ + { "id": 2, "content-charset": "UTF-8", "content-length": 21, "content-type": "text/html"}, + { "id": 3, "content-charset": "ISO-8859-1", "content-length": 20, "content-type": "text/html"}, + { "id": 4, "content-type": "text/plain", "content": "0.5 equals \\u00bd\\n"} +]}]},[]]]] +EOF + +# Both the UTF-8 and ISO-8859-1 part should have U+00BD +cat_expected_head > EXPECTED.withhtml +cat <> EXPECTED.withhtml +"content": [ + { "id": 2, "content-type": "text/html", "content": "

0.5 equals \\u00bd

\\n"}, + { "id": 3, "content-type": "text/html", "content": "

0.5 equals \\u00bd

\\n"}, + { "id": 4, "content-type": "text/plain", "content": "0.5 equals \\u00bd\\n"} +]}]},[]]]] +EOF + +test_begin_subtest "html parts excluded by default" +notmuch show --format=json id:htmlmessage > OUTPUT +test_expect_equal_json "$(cat OUTPUT)" "$(cat EXPECTED.nohtml)" + +test_begin_subtest "html parts included" +notmuch show --format=json --include-html id:htmlmessage > OUTPUT +test_expect_equal_json "$(cat OUTPUT)" "$(cat EXPECTED.withhtml)" + +test_done