Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id E6C48431FB6 for ; Tue, 22 May 2012 05:53:35 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -2.3 X-Spam-Level: X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6EZeAtj3uhqd for ; Tue, 22 May 2012 05:53:34 -0700 (PDT) Received: from max.feld.cvut.cz (max.feld.cvut.cz [147.32.192.36]) by olra.theworths.org (Postfix) with ESMTP id 9C76B431FAE for ; Tue, 22 May 2012 05:53:34 -0700 (PDT) Received: from localhost (unknown [192.168.200.4]) by max.feld.cvut.cz (Postfix) with ESMTP id 2D88919F3345; Tue, 22 May 2012 14:53:33 +0200 (CEST) X-Virus-Scanned: IMAP AMAVIS Received: from max.feld.cvut.cz ([192.168.200.1]) by localhost (styx.feld.cvut.cz [192.168.200.4]) (amavisd-new, port 10044) with ESMTP id AYx5Np-t01_W; Tue, 22 May 2012 14:53:28 +0200 (CEST) Received: from imap.feld.cvut.cz (imap.feld.cvut.cz [147.32.192.34]) by max.feld.cvut.cz (Postfix) with ESMTP id 4839919F331A; Tue, 22 May 2012 14:53:28 +0200 (CEST) Received: from steelpick.2x.cz (note-sojka.felk.cvut.cz [147.32.86.30]) (Authenticated sender: sojkam1) by imap.feld.cvut.cz (Postfix) with ESMTPSA id 2164E660968; Tue, 22 May 2012 14:53:27 +0200 (CEST) Received: from wsh by steelpick.2x.cz with local (Exim 4.77) (envelope-from ) id 1SWoac-0000JS-V4; Tue, 22 May 2012 14:53:26 +0200 From: Michal Sojka To: Adam Wolfe Gordon , Tomi Ollila Subject: Re: emacs complains about encoding? In-Reply-To: References: <20120515194455.B7AD5100646@guru.guru-group.fi> <878vgsbprq.fsf@nikula.org> User-Agent: Notmuch/0.12+185~g9826d2c (http://notmuchmail.org) Emacs/23.4.1 (x86_64-pc-linux-gnu) Date: Tue, 22 May 2012 14:53:26 +0200 Message-ID: <871umc1int.fsf@steelpick.2x.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 May 2012 12:53:36 -0000 Hello Adam, Adam Wolfe Gordon writes: > It turns out it's actually not the emacs side, but an interaction > between our JSON reply format and emacs. > > The JSON reply (and show) code includes part content for all text/* > parts except text/html. Because all JSON is required to be UTF-8, it > handles the encoding itself, puts UTF-8 text in, and omits a > content-charset field from the output. Emacs passes on the > content-charset field to mm-display-part-inline if it's available, but > for text/plain parts it's not, leaving mm-display-part-inline to its > own devices for figuring out what the charset is. It seems > mm-display-part-inline correctly figures out that it's UTF-8, and puts > in the series of ugly \nnn characters because that's what emacs does > with UTF-8 sometimes. > > In the original reply stuff (pre-JSON reply format) emacs used the > output of notmuch reply verbatim, so all the charset stuff was handled > in notmuch. Before f6c170fabca8f39e74705e3813504137811bf162, emacs was > using the JSON reply format, but was inserting the text itself instead > of using mm-display-part-inline, so emacs still wasn't trying to do > any charset manipulation. Using mm-display-part-inline is desirable > because it lets us handle non-text/plain (e.g. text/html) parts > correctly in reply, and makes the display more consistent (since we > use it for show). But, it leads to this problem. > > So, there are a couple of solutions I can see: > > 1) Have the JSON formats include the original content-charset even > though they're actually outputting UTF-8. Of the solutions I tried, > this is the best, even though it doesn't sound like a good thing to > do. > > 2) Have the JSON formats include content only if it's actually UTF-8. > This means that for non-UTF-8 parts (including ASCII parts), the emacs > interface has to do more work to display the part content, since it > must fetch it from outside first. When I tried this, it worked but > caused the \nnn to show up when viewing messages in emacs. I suspect > this is because it sets a charset for the whole buffer, and can't > accommodate messages with different charsets in the same buffer > properly. Reply works correctly, though. > > 3) Have the JSON formats include the charset for all parts, but make > it UTF-8 for all parts they include content for (since we're actually > outputting UTF-8). This doesn't seem to fix the problem, even though > it seems like it should. > > If no one has a better idea or a strong reason not to, I'll send a > patch for solution (1). Thank you very much for your analysis. It encouraged me to dig into the problem and I've found another solution, which might be better than those you suggested. I traced what Emacs does with the text inside notmuch-mm-display-part-inline and the wrong charset conversion happens deeply in elisp code in mm-with-part called by mm-get-part, which is in turn called by mm-inline-text. There is a way to make mm-inline-text not to call mm-get-part, which is to set the charset to 'gnus-decoded. This sounds like something that applies to our situation, where the part is already decoded. The following patch (apply it with git am -c) solves the problem for me. However, I'm not sure it is a universal solution. It sets the charset only if it is not defined in notmuch json output and I'm not sure that this is correct. text/html parts seem to have charset defined, but as you wrote that json is always utf-8, so it might be that we need 'gnus-decoded always, independently of the json output. What do you think? -Michal ----8<------- diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el index 7fa441a..8070f05 100644 --- a/emacs/notmuch-lib.el +++ b/emacs/notmuch-lib.el @@ -244,7 +244,7 @@ the given type." current buffer, if possible." (let ((display-buffer (current-buffer))) (with-temp-buffer - (let* ((charset (plist-get part :content-charset)) + (let* ((charset (or (plist-get part :content-charset) 'gnus-decoded)) (handle (mm-make-handle (current-buffer) `(,content-type (charset . ,charset))))) ;; If the user wants the part inlined, insert the content and ;; test whether we are able to inline it (which includes both