Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id B6A6E431FC0 for ; Wed, 23 May 2012 03:15:31 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -2.3 X-Spam-Level: X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jxkonYeCNwoI for ; Wed, 23 May 2012 03:15:30 -0700 (PDT) Received: from max.feld.cvut.cz (max.feld.cvut.cz [147.32.192.36]) by olra.theworths.org (Postfix) with ESMTP id 5D85C431FBD for ; Wed, 23 May 2012 03:15:30 -0700 (PDT) Received: from localhost (unknown [192.168.200.4]) by max.feld.cvut.cz (Postfix) with ESMTP id 69D9619F3375; Wed, 23 May 2012 12:15:29 +0200 (CEST) X-Virus-Scanned: IMAP AMAVIS Received: from max.feld.cvut.cz ([192.168.200.1]) by localhost (styx.feld.cvut.cz [192.168.200.4]) (amavisd-new, port 10044) with ESMTP id UCiCyRxsfvJD; Wed, 23 May 2012 12:15:20 +0200 (CEST) Received: from imap.feld.cvut.cz (imap.feld.cvut.cz [147.32.192.34]) by max.feld.cvut.cz (Postfix) with ESMTP id B350C19F3353; Wed, 23 May 2012 12:15:19 +0200 (CEST) Received: from steelpick.2x.cz (note-sojka.felk.cvut.cz [147.32.86.30]) (Authenticated sender: sojkam1) by imap.feld.cvut.cz (Postfix) with ESMTPSA id 9329D660968; Wed, 23 May 2012 12:15:18 +0200 (CEST) Received: from wsh by steelpick.2x.cz with local (Exim 4.77) (envelope-from ) id 1SX8b8-0004sh-EC; Wed, 23 May 2012 12:15:18 +0200 From: Michal Sojka To: Tomi Ollila , Adam Wolfe Gordon Subject: Re: emacs complains about encoding? In-Reply-To: References: <20120515194455.B7AD5100646@guru.guru-group.fi> <878vgsbprq.fsf@nikula.org> <871umc1int.fsf@steelpick.2x.cz> User-Agent: Notmuch/0.13+14~g2d2a5a4 (http://notmuchmail.org) Emacs/23.4.1 (x86_64-pc-linux-gnu) Date: Wed, 23 May 2012 12:15:18 +0200 Message-ID: <87r4uburt5.fsf@steelpick.2x.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 May 2012 10:15:31 -0000 Tomi Ollila writes: > Michal Sojka writes: > >> Hello Adam, >> >> Adam Wolfe Gordon writes: >>> It turns out it's actually not the emacs side, but an interaction >>> between our JSON reply format and emacs. >>> >>> The JSON reply (and show) code includes part content for all text/* >>> parts except text/html. Because all JSON is required to be UTF-8, it >>> handles the encoding itself, puts UTF-8 text in, and omits a >>> content-charset field from the output. Emacs passes on the >>> content-charset field to mm-display-part-inline if it's available, but >>> for text/plain parts it's not, leaving mm-display-part-inline to its >>> own devices for figuring out what the charset is. It seems >>> mm-display-part-inline correctly figures out that it's UTF-8, and puts >>> in the series of ugly \nnn characters because that's what emacs does >>> with UTF-8 sometimes. >>> >>> In the original reply stuff (pre-JSON reply format) emacs used the >>> output of notmuch reply verbatim, so all the charset stuff was handled >>> in notmuch. Before f6c170fabca8f39e74705e3813504137811bf162, emacs was >>> using the JSON reply format, but was inserting the text itself instead >>> of using mm-display-part-inline, so emacs still wasn't trying to do >>> any charset manipulation. Using mm-display-part-inline is desirable >>> because it lets us handle non-text/plain (e.g. text/html) parts >>> correctly in reply, and makes the display more consistent (since we >>> use it for show). But, it leads to this problem. >>> >>> So, there are a couple of solutions I can see: >>> >>> 1) Have the JSON formats include the original content-charset even >>> though they're actually outputting UTF-8. Of the solutions I tried, >>> this is the best, even though it doesn't sound like a good thing to >>> do. >>> >>> 2) Have the JSON formats include content only if it's actually UTF-8. >>> This means that for non-UTF-8 parts (including ASCII parts), the emacs >>> interface has to do more work to display the part content, since it >>> must fetch it from outside first. When I tried this, it worked but >>> caused the \nnn to show up when viewing messages in emacs. I suspect >>> this is because it sets a charset for the whole buffer, and can't >>> accommodate messages with different charsets in the same buffer >>> properly. Reply works correctly, though. >>> >>> 3) Have the JSON formats include the charset for all parts, but make >>> it UTF-8 for all parts they include content for (since we're actually >>> outputting UTF-8). This doesn't seem to fix the problem, even though >>> it seems like it should. >>> >>> If no one has a better idea or a strong reason not to, I'll send a >>> patch for solution (1). >> >> Thank you very much for your analysis. It encouraged me to dig into the >> problem and I've found another solution, which might be better than >> those you suggested. >> >> I traced what Emacs does with the text inside >> notmuch-mm-display-part-inline and the wrong charset conversion happens >> deeply in elisp code in mm-with-part called by mm-get-part, which is in >> turn called by mm-inline-text. There is a way to make mm-inline-text not >> to call mm-get-part, which is to set the charset to 'gnus-decoded. This >> sounds like something that applies to our situation, where the part is >> already decoded. > > You've digged deeper than I did... :) > >> >> The following patch (apply it with git am -c) solves the problem for me. >> However, I'm not sure it is a universal solution. It sets the charset >> only if it is not defined in notmuch json output and I'm not sure that >> this is correct. text/html parts seem to have charset defined, but as >> you wrote that json is always utf-8, so it might be that we need >> 'gnus-decoded always, independently of the json output. What do you >> think? > > No -- when non-inlined content is fetched by executing command > notmuch show --format=raw --part=n --decrypt id:"" the content > is received with original charset -- and then mm-* components needs to have > correct charset set (well, I think, I have not tested ;). > > Also, we cannot rely that the json output doesn't contain content-charset > information in the future... > > I'm currently applying this to my build tree whenever I rebuild notmuch for > my own use: id:"1337533094-5467-1-git-send-email-tomi.ollila@iki.fi" Great, this is more or less the same solution :-) > I think the current plan is to use the same decoding lookup table that > notmuch-show is using in reply too. Which table do you refer to? notmuch-show-handlers-for? > That is good plan for consistency point of view. That just requires > some code to be moved from notmuch-show.el to some other file (maybe a > new one). Sounds good. Cheers, -Michal