37/e96be2f98f25c840af2c8283437b1c3883a701

   1 Return-Path: <sojkam1@fel.cvut.cz>\r
   2 X-Original-To: notmuch@notmuchmail.org\r
   3 Delivered-To: notmuch@notmuchmail.org\r
   4 Received: from localhost (localhost [127.0.0.1])\r
   5         by olra.theworths.org (Postfix) with ESMTP id B6A6E431FC0\r
   6         for <notmuch@notmuchmail.org>; Wed, 23 May 2012 03:15:31 -0700 (PDT)\r
   7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
   8 X-Spam-Flag: NO\r
   9 X-Spam-Score: -2.3\r
  10 X-Spam-Level: \r
  11 X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5\r
  12         tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled\r
  13 Received: from olra.theworths.org ([127.0.0.1])\r
  14         by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
  15         with ESMTP id jxkonYeCNwoI for <notmuch@notmuchmail.org>;\r
  16         Wed, 23 May 2012 03:15:30 -0700 (PDT)\r
  17 Received: from max.feld.cvut.cz (max.feld.cvut.cz [147.32.192.36])\r
  18         by olra.theworths.org (Postfix) with ESMTP id 5D85C431FBD\r
  19         for <notmuch@notmuchmail.org>; Wed, 23 May 2012 03:15:30 -0700 (PDT)\r
  20 Received: from localhost (unknown [192.168.200.4])\r
  21         by max.feld.cvut.cz (Postfix) with ESMTP id 69D9619F3375;\r
  22         Wed, 23 May 2012 12:15:29 +0200 (CEST)\r
  23 X-Virus-Scanned: IMAP AMAVIS\r
  24 Received: from max.feld.cvut.cz ([192.168.200.1])\r
  25         by localhost (styx.feld.cvut.cz [192.168.200.4]) (amavisd-new,\r
  26         port 10044)\r
  27         with ESMTP id UCiCyRxsfvJD; Wed, 23 May 2012 12:15:20 +0200 (CEST)\r
  28 Received: from imap.feld.cvut.cz (imap.feld.cvut.cz [147.32.192.34])\r
  29         by max.feld.cvut.cz (Postfix) with ESMTP id B350C19F3353;\r
  30         Wed, 23 May 2012 12:15:19 +0200 (CEST)\r
  31 Received: from steelpick.2x.cz (note-sojka.felk.cvut.cz [147.32.86.30])\r
  32         (Authenticated sender: sojkam1)\r
  33         by imap.feld.cvut.cz (Postfix) with ESMTPSA id 9329D660968;\r
  34         Wed, 23 May 2012 12:15:18 +0200 (CEST)\r
  35 Received: from wsh by steelpick.2x.cz with local (Exim 4.77)\r
  36         (envelope-from <sojkam1@fel.cvut.cz>)\r
  37         id 1SX8b8-0004sh-EC; Wed, 23 May 2012 12:15:18 +0200\r
  38 From: Michal Sojka <sojkam1@fel.cvut.cz>\r
  39 To: Tomi Ollila <tomi.ollila@iki.fi>, Adam Wolfe Gordon <awg+notmuch@xvx.ca>\r
  40 Subject: Re: emacs complains about encoding?\r
  41 In-Reply-To: <m27gw4nyfu.fsf@guru.guru-group.fi>\r
  42 References: <20120515194455.B7AD5100646@guru.guru-group.fi>\r
  43         <878vgsbprq.fsf@nikula.org> <m23970bhre.fsf@guru.guru-group.fi>\r
  44         <CAMoJFUungAFPWy0d1Lh+rqmpK--P7MMEwNaewWHR=rbYo+BKsA@mail.gmail.com>\r
  45         <871umc1int.fsf@steelpick.2x.cz>\r
  46         <m27gw4nyfu.fsf@guru.guru-group.fi>\r
  47 User-Agent: Notmuch/0.13+14~g2d2a5a4 (http://notmuchmail.org) Emacs/23.4.1\r
  48         (x86_64-pc-linux-gnu)\r
  49 Date: Wed, 23 May 2012 12:15:18 +0200\r
  50 Message-ID: <87r4uburt5.fsf@steelpick.2x.cz>\r
  51 MIME-Version: 1.0\r
  52 Content-Type: text/plain; charset=us-ascii\r
  53 Cc: notmuch@notmuchmail.org\r
  54 X-BeenThere: notmuch@notmuchmail.org\r
  55 X-Mailman-Version: 2.1.13\r
  56 Precedence: list\r
  57 List-Id: "Use and development of the notmuch mail system."\r
  58         <notmuch.notmuchmail.org>\r
  59 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
  60         <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
  61 List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
  62 List-Post: <mailto:notmuch@notmuchmail.org>\r
  63 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
  64 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
  65         <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
  66 X-List-Received-Date: Wed, 23 May 2012 10:15:31 -0000\r
  67 \r
  68 Tomi Ollila <tomi.ollila@iki.fi> writes:\r
  69 > Michal Sojka <sojkam1@fel.cvut.cz> writes:\r
  70 >\r
  71 >> Hello Adam,\r
  72 >>\r
  73 >> Adam Wolfe Gordon <awg+notmuch@xvx.ca> writes:\r
  74 >>> It turns out it's actually not the emacs side, but an interaction\r
  75 >>> between our JSON reply format and emacs.\r
  76 >>>\r
  77 >>> The JSON reply (and show) code includes part content for all text/*\r
  78 >>> parts except text/html. Because all JSON is required to be UTF-8, it\r
  79 >>> handles the encoding itself, puts UTF-8 text in, and omits a\r
  80 >>> content-charset field from the output. Emacs passes on the\r
  81 >>> content-charset field to mm-display-part-inline if it's available, but\r
  82 >>> for text/plain parts it's not, leaving mm-display-part-inline to its\r
  83 >>> own devices for figuring out what the charset is. It seems\r
  84 >>> mm-display-part-inline correctly figures out that it's UTF-8, and puts\r
  85 >>> in the series of ugly \nnn characters because that's what emacs does\r
  86 >>> with UTF-8 sometimes.\r
  87 >>>\r
  88 >>> In the original reply stuff (pre-JSON reply format) emacs used the\r
  89 >>> output of notmuch reply verbatim, so all the charset stuff was handled\r
  90 >>> in notmuch. Before f6c170fabca8f39e74705e3813504137811bf162, emacs was\r
  91 >>> using the JSON reply format, but was inserting the text itself instead\r
  92 >>> of using mm-display-part-inline, so emacs still wasn't trying to do\r
  93 >>> any charset manipulation. Using mm-display-part-inline is desirable\r
  94 >>> because it lets us handle non-text/plain (e.g. text/html) parts\r
  95 >>> correctly in reply, and makes the display more consistent (since we\r
  96 >>> use it for show). But, it leads to this problem.\r
  97 >>>\r
  98 >>> So, there are a couple of solutions I can see:\r
  99 >>>\r
 100 >>> 1) Have the JSON formats include the original content-charset even\r
 101 >>> though they're actually outputting UTF-8. Of the solutions I tried,\r
 102 >>> this is the best, even though it doesn't sound like a good thing to\r
 103 >>> do.\r
 104 >>>\r
 105 >>> 2) Have the JSON formats include content only if it's actually UTF-8.\r
 106 >>> This means that for non-UTF-8 parts (including ASCII parts), the emacs\r
 107 >>> interface has to do more work to display the part content, since it\r
 108 >>> must fetch it from outside first. When I tried this, it worked but\r
 109 >>> caused the \nnn to show up when viewing messages in emacs. I suspect\r
 110 >>> this is because it sets a charset for the whole buffer, and can't\r
 111 >>> accommodate messages with different charsets in the same buffer\r
 112 >>> properly. Reply works correctly, though.\r
 113 >>>\r
 114 >>> 3) Have the JSON formats include the charset for all parts, but make\r
 115 >>> it UTF-8 for all parts they include content for (since we're actually\r
 116 >>> outputting UTF-8). This doesn't seem to fix the problem, even though\r
 117 >>> it seems like it should.\r
 118 >>>\r
 119 >>> If no one has a better idea or a strong reason not to, I'll send a\r
 120 >>> patch for solution (1).\r
 121 >>\r
 122 >> Thank you very much for your analysis. It encouraged me to dig into the\r
 123 >> problem and I've found another solution, which might be better than\r
 124 >> those you suggested.\r
 125 >>\r
 126 >> I traced what Emacs does with the text inside\r
 127 >> notmuch-mm-display-part-inline and the wrong charset conversion happens\r
 128 >> deeply in elisp code in mm-with-part called by mm-get-part, which is in\r
 129 >> turn called by mm-inline-text. There is a way to make mm-inline-text not\r
 130 >> to call mm-get-part, which is to set the charset to 'gnus-decoded. This\r
 131 >> sounds like something that applies to our situation, where the part is\r
 132 >> already decoded.\r
 133 >\r
 134 > You've digged deeper than I did... :)\r
 135 >\r
 136 >>\r
 137 >> The following patch (apply it with git am -c) solves the problem for me.\r
 138 >> However, I'm not sure it is a universal solution. It sets the charset\r
 139 >> only if it is not defined in notmuch json output and I'm not sure that\r
 140 >> this is correct. text/html parts seem to have charset defined, but as\r
 141 >> you wrote that json is always utf-8, so it might be that we need\r
 142 >> 'gnus-decoded always, independently of the json output. What do you\r
 143 >> think?\r
 144 >\r
 145 > No -- when non-inlined content is fetched by executing command\r
 146 > notmuch show --format=raw --part=n --decrypt id:"<message-id>" the content\r
 147 > is received with original charset -- and then mm-* components needs to have\r
 148 > correct charset set (well, I think, I have not tested ;). \r
 149 >\r
 150 > Also, we cannot rely that the json output doesn't contain content-charset\r
 151 > information in the future...\r
 152 >\r
 153 > I'm currently applying this to my build tree whenever I rebuild notmuch for\r
 154 > my own use: id:"1337533094-5467-1-git-send-email-tomi.ollila@iki.fi"\r
 155 \r
 156 Great, this is more or less the same solution :-)\r
 157 \r
 158 > I think the current plan is to use the same decoding lookup table that\r
 159 > notmuch-show is using in reply too. \r
 160 \r
 161 Which table do you refer to? notmuch-show-handlers-for?\r
 162 \r
 163 > That is good plan for consistency point of view. That just requires\r
 164 > some code to be moved from notmuch-show.el to some other file (maybe a\r
 165 > new one).\r
 166 \r
 167 Sounds good.\r
 168 \r
 169 Cheers,\r
 170 -Michal\r