Re: [PATCH v1 1/2] emacs: Observe the charset of MIME parts when reading them.
authorMark Walters <markwalters1009@gmail.com>
Mon, 2 May 2016 07:37:46 +0000 (08:37 +0100)
committerW. Trevor King <wking@tremily.us>
Sat, 20 Aug 2016 23:21:44 +0000 (16:21 -0700)
9d/2c898db4a08fc9021667dc86e83d931ea6de9c [new file with mode: 0644]

diff --git a/9d/2c898db4a08fc9021667dc86e83d931ea6de9c b/9d/2c898db4a08fc9021667dc86e83d931ea6de9c
new file mode 100644 (file)
index 0000000..a60f078
--- /dev/null
@@ -0,0 +1,146 @@
+Return-Path: <markwalters1009@gmail.com>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by arlo.cworth.org (Postfix) with ESMTP id E98366DE01BE\r
+ for <notmuch@notmuchmail.org>; Mon,  2 May 2016 00:38:01 -0700 (PDT)\r
+X-Virus-Scanned: Debian amavisd-new at cworth.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.306\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.306 tagged_above=-999 required=5 tests=[AWL=0.264,\r
+  DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,\r
+ FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7,\r
+ RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001]\r
+ autolearn=disabled\r
+Received: from arlo.cworth.org ([127.0.0.1])\r
+ by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id ng3Vavm_NEhg for <notmuch@notmuchmail.org>;\r
+ Mon,  2 May 2016 00:37:51 -0700 (PDT)\r
+Received: from mail-wm0-f66.google.com (mail-wm0-f66.google.com\r
+ [74.125.82.66]) by arlo.cworth.org (Postfix) with ESMTPS id B216C6DE00F5 for\r
+ <notmuch@notmuchmail.org>; Mon,  2 May 2016 00:37:50 -0700 (PDT)\r
+Received: by mail-wm0-f66.google.com with SMTP id n129so15966730wmn.1\r
+ for <notmuch@notmuchmail.org>; Mon, 02 May 2016 00:37:50 -0700 (PDT)\r
+DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;\r
+ h=from:to:subject:in-reply-to:references:user-agent:date:message-id\r
+ :mime-version; bh=jyUvm/jh9U3EesS14XXW3DOsiSjy9K4/efHfPNiw2M0=;\r
+ b=BA2KHN2n3lRFfAUh2KNR0ai44crcfRTavMDuN1cQOYoxRko6SsJPspOVOxb6dhNi6m\r
+ O/EAh4g9PcD3JPvfspxwUPC3s5G0gFZXa3iilHRnQ+9nic6SssCMTxtwi/OUwqIsKGsJ\r
+ iv1HILx+AFtzNXp+ekEevrBHHulx7blCHByg1Jcf6iM1MCRY+RfDiip7AAVLRdPDN2i3\r
+ z1wy0eIHdv66++lsxDjf86kljlqaUaM+jPi9T+bouoJ+NSfdgVcFQmRYmN1oWdeZUowU\r
+ lZ0oDyY0tdvNwi9z99FG5mCog6twKmKPwIxv0lUPSFtNUU0jf1HkhWzSZf4w5WJ66d0u\r
+ JeMw==\r
+X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\r
+ d=1e100.net; s=20130820;\r
+ h=x-gm-message-state:from:to:subject:in-reply-to:references\r
+ :user-agent:date:message-id:mime-version;\r
+ bh=jyUvm/jh9U3EesS14XXW3DOsiSjy9K4/efHfPNiw2M0=;\r
+ b=g6aLXZZZhBeLi93cmfrWHTlEwRXwhuut2SoXD4FyTEJ2iMkad7dWEin18NSng7eFrC\r
+ 0jJKzdxns/HJz3PQ0pPcMfQgR0tXW5bXlfNQnpschvBOXsdbp+BXWb6T847lHjrA2VC1\r
+ +05OmMRw9ZLXhK/xWq1lkDnlN9aM7uoqPcekkgtSeb8OVCux0SOJJesE3fit2HTYedC8\r
+ JSN3bVUamXz47FzyxSionprXUfOGBN0R8OXM/pNg0x75zPYiIG9bIItyoOVAHbpbgKP7\r
+ gBQb/YXqi4Av+r2ivMckugb6Lj8QK60Z5lamZlWcX0+FFTsduaD3+a4k69jC0VKp3sYI\r
+ z3FA==\r
+X-Gm-Message-State:\r
+ AOPr4FVSIBLECRL/61jKkHKqyun9mUD6ZWRN1TYtXLS/RkDaF/XBAj95PWDo7oe0BAaDzg==\r
+X-Received: by 10.194.10.162 with SMTP id j2mr34405240wjb.72.1462174668681;\r
+ Mon, 02 May 2016 00:37:48 -0700 (PDT)\r
+Received: from localhost (5751dfa2.skybroadband.com. [87.81.223.162])\r
+ by smtp.gmail.com with ESMTPSA id y70sm17293483wmd.3.2016.05.02.00.37.47\r
+ (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\r
+ Mon, 02 May 2016 00:37:47 -0700 (PDT)\r
+From: Mark Walters <markwalters1009@gmail.com>\r
+To: David Edmondson <dme@dme.org>, notmuch@notmuchmail.org\r
+Subject: Re: [PATCH v1 1/2] emacs: Observe the charset of MIME parts when\r
+ reading them.\r
+In-Reply-To: <1461999108-68582-2-git-send-email-dme@dme.org>\r
+References: <1461999108-68582-1-git-send-email-dme@dme.org>\r
+ <1461999108-68582-2-git-send-email-dme@dme.org>\r
+User-Agent: Notmuch/0.22~rc1+2~g56141bf (http://notmuchmail.org) Emacs/24.4.1\r
+ (x86_64-pc-linux-gnu)\r
+Date: Mon, 02 May 2016 08:37:46 +0100\r
+Message-ID: <877ffc9agl.fsf@qmul.ac.uk>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.20\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <https://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch/>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <https://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Mon, 02 May 2016 07:38:02 -0000\r
+\r
+\r
+On Sat, 30 Apr 2016, David Edmondson <dme@dme.org> wrote:\r
+> `notmuch--get-bodypart-raw' previously assumed that all non-binary MIME\r
+> parts could be successfully read by assuming that they were UTF-8\r
+> encoded. This was demonstrated to be wrong, specifically when a part was\r
+> marked as ISO8859-1 and included accented characters (which were\r
+> incorrectly rendered as a result).\r
+>\r
+> Rather than assuming UTF-8, attempt to use the part's declared charset\r
+> when reading it, falling back to US-ASCII if the declared charset is\r
+> unknown, unsupported or invalid.\r
+\r
+As this seemed hard to test (if I understand the bug correctly it didn't\r
+show up on my test of the entire of the entire performance corpus -- of\r
+course my testing could have been wrong) would it be possible to add a test\r
+for it?\r
+\r
+Best wishes\r
+\r
+Mark\r
+\r
+\r
+> ---\r
+>  emacs/notmuch-lib.el | 16 +++++++++++++++-\r
+>  1 file changed, 15 insertions(+), 1 deletion(-)\r
+>\r
+> diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el\r
+> index 78978ee..f05ded6 100644\r
+> --- a/emacs/notmuch-lib.el\r
+> +++ b/emacs/notmuch-lib.el\r
+> @@ -23,6 +23,7 @@\r
+>  \r
+>  ;;; Code:\r
+>  \r
+> +(require 'mm-util)\r
+>  (require 'mm-view)\r
+>  (require 'mm-decode)\r
+>  (require 'cl)\r
+> @@ -572,7 +573,20 @@ the given type."\r
+>                                 ,@(when process-crypto '("--decrypt"))\r
+>                                 ,(notmuch-id-to-query (plist-get msg :id))))\r
+>                         (coding-system-for-read\r
+> -                        (if binaryp 'no-conversion 'utf-8)))\r
+> +                        (if binaryp 'no-conversion\r
+> +                          (let ((coding-system (mm-charset-to-coding-system\r
+> +                                                (plist-get part :content-charset))))\r
+> +                            ;; Sadly,\r
+> +                            ;; `mm-charset-to-coding-system' seems\r
+> +                            ;; to return things that are not\r
+> +                            ;; considered acceptable values for\r
+> +                            ;; `coding-system-for-read'.\r
+> +                            (if (coding-system-p coding-system)\r
+> +                                coding-system\r
+> +                              ;; RFC 2047 says that the default\r
+> +                              ;; charset is US-ASCII. RFC6657\r
+> +                              ;; complicates this somewhat.\r
+> +                              'us-ascii)))))\r
+>                     (apply #'call-process notmuch-command nil '(t nil) nil args)\r
+>                     (buffer-string))))))\r
+>      (when (and cache data)\r
+> -- \r
+> 2.7.1\r
+>\r
+> _______________________________________________\r
+> notmuch mailing list\r
+> notmuch@notmuchmail.org\r
+> https://notmuchmail.org/mailman/listinfo/notmuch\r