Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 36C11431FBD for ; Tue, 22 May 2012 06:21:34 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cbZkpA55rnkM for ; Tue, 22 May 2012 06:21:33 -0700 (PDT) Received: from guru.guru-group.fi (guru.guru-group.fi [46.183.73.34]) by olra.theworths.org (Postfix) with ESMTP id 9AF56431FAE for ; Tue, 22 May 2012 06:21:32 -0700 (PDT) Received: by guru.guru-group.fi (Postfix, from userid 501) id BBF25100641; Tue, 22 May 2012 16:21:41 +0300 (EEST) From: Tomi Ollila To: Michal Sojka , Adam Wolfe Gordon Subject: Re: emacs complains about encoding? In-Reply-To: <871umc1int.fsf@steelpick.2x.cz> References: <20120515194455.B7AD5100646@guru.guru-group.fi> <878vgsbprq.fsf@nikula.org> <871umc1int.fsf@steelpick.2x.cz> User-Agent: Notmuch/0.13~rc1+40~g96c989b (http://notmuchmail.org) Emacs/23.1.1 (x86_64-redhat-linux-gnu) X-Face: HhBM'cA~ MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 May 2012 13:21:34 -0000 Michal Sojka writes: > Hello Adam, > > Adam Wolfe Gordon writes: >> It turns out it's actually not the emacs side, but an interaction >> between our JSON reply format and emacs. >> >> The JSON reply (and show) code includes part content for all text/* >> parts except text/html. Because all JSON is required to be UTF-8, it >> handles the encoding itself, puts UTF-8 text in, and omits a >> content-charset field from the output. Emacs passes on the >> content-charset field to mm-display-part-inline if it's available, but >> for text/plain parts it's not, leaving mm-display-part-inline to its >> own devices for figuring out what the charset is. It seems >> mm-display-part-inline correctly figures out that it's UTF-8, and puts >> in the series of ugly \nnn characters because that's what emacs does >> with UTF-8 sometimes. >> >> In the original reply stuff (pre-JSON reply format) emacs used the >> output of notmuch reply verbatim, so all the charset stuff was handled >> in notmuch. Before f6c170fabca8f39e74705e3813504137811bf162, emacs was >> using the JSON reply format, but was inserting the text itself instead >> of using mm-display-part-inline, so emacs still wasn't trying to do >> any charset manipulation. Using mm-display-part-inline is desirable >> because it lets us handle non-text/plain (e.g. text/html) parts >> correctly in reply, and makes the display more consistent (since we >> use it for show). But, it leads to this problem. >> >> So, there are a couple of solutions I can see: >> >> 1) Have the JSON formats include the original content-charset even >> though they're actually outputting UTF-8. Of the solutions I tried, >> this is the best, even though it doesn't sound like a good thing to >> do. >> >> 2) Have the JSON formats include content only if it's actually UTF-8. >> This means that for non-UTF-8 parts (including ASCII parts), the emacs >> interface has to do more work to display the part content, since it >> must fetch it from outside first. When I tried this, it worked but >> caused the \nnn to show up when viewing messages in emacs. I suspect >> this is because it sets a charset for the whole buffer, and can't >> accommodate messages with different charsets in the same buffer >> properly. Reply works correctly, though. >> >> 3) Have the JSON formats include the charset for all parts, but make >> it UTF-8 for all parts they include content for (since we're actually >> outputting UTF-8). This doesn't seem to fix the problem, even though >> it seems like it should. >> >> If no one has a better idea or a strong reason not to, I'll send a >> patch for solution (1). > > Thank you very much for your analysis. It encouraged me to dig into the > problem and I've found another solution, which might be better than > those you suggested. > > I traced what Emacs does with the text inside > notmuch-mm-display-part-inline and the wrong charset conversion happens > deeply in elisp code in mm-with-part called by mm-get-part, which is in > turn called by mm-inline-text. There is a way to make mm-inline-text not > to call mm-get-part, which is to set the charset to 'gnus-decoded. This > sounds like something that applies to our situation, where the part is > already decoded. You've digged deeper than I did... :) > > The following patch (apply it with git am -c) solves the problem for me. > However, I'm not sure it is a universal solution. It sets the charset > only if it is not defined in notmuch json output and I'm not sure that > this is correct. text/html parts seem to have charset defined, but as > you wrote that json is always utf-8, so it might be that we need > 'gnus-decoded always, independently of the json output. What do you > think? No -- when non-inlined content is fetched by executing command notmuch show --format=raw --part=n --decrypt id:"" the content is received with original charset -- and then mm-* components needs to have correct charset set (well, I think, I have not tested ;). Also, we cannot rely that the json output doesn't contain content-charset information in the future... I'm currently applying this to my build tree whenever I rebuild notmuch for my own use: id:"1337533094-5467-1-git-send-email-tomi.ollila@iki.fi" I think the current plan is to use the same decoding lookup table that notmuch-show is using in reply too. That is good plan for consistency point of view. That just requires some code to be moved from notmuch-show.el to some other file (maybe a new one). > -Michal Tomi > > ----8<------- > diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el > index 7fa441a..8070f05 100644 > --- a/emacs/notmuch-lib.el > +++ b/emacs/notmuch-lib.el > @@ -244,7 +244,7 @@ the given type." > current buffer, if possible." > (let ((display-buffer (current-buffer))) > (with-temp-buffer > - (let* ((charset (plist-get part :content-charset)) > + (let* ((charset (or (plist-get part :content-charset) 'gnus-decoded)) > (handle (mm-make-handle (current-buffer) `(,content-type (charset . ,charset))))) > ;; If the user wants the part inlined, insert the content and > ;; test whether we are able to inline it (which includes both > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch