Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id AED04431FAF for ; Sat, 17 Nov 2012 21:29:37 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 1.7 X-Spam-Level: * X-Spam-Status: No, score=1.7 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=2.499, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id c-DuE+xgLewG for ; Sat, 17 Nov 2012 21:29:36 -0800 (PST) Received: from mail-qa0-f46.google.com (mail-qa0-f46.google.com [209.85.216.46]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 963A7431FAE for ; Sat, 17 Nov 2012 21:29:36 -0800 (PST) Received: by mail-qa0-f46.google.com with SMTP id c11so4919206qad.5 for ; Sat, 17 Nov 2012 21:29:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:in-reply-to:references:user-agent:date:message-id :mime-version:content-type; bh=5SyJn52b3ILvGjmTDka1q2VrY7pb+68yyMaU5hgGDF8=; b=jF8RpRQXpDxXA1qzMX+S4kn0Jo58b/gojeMbsnxOkXgNBWbS8RcZpDk5sj98vrFcie qQva+w/oaKIO6SS48RUSopfxHu4Vyo9l3VCF8TswmF+Ml1utjVe+aSq917WvxZKoA09Q tiDZmujFKMsFF/LS6phYzKm0boQxiR1PcYtyRZcsqwLqwQk1A1KAyeS995y9fSZdUUsV CmE/mkeYxTBKr1ABmlk7GNiZKmzCPhgMJ6vMxYQEFEC5C0nehHrGQ965J/HKoHdk81m5 3b6V6i0skiML73rwvHh9WMhY6jduIS9h+jbtOiZRd1VhNdOAl3r2XucgcjR7+siGokTR G3/Q== Received: by 10.49.24.163 with SMTP id v3mr9571296qef.48.1353216576087; Sat, 17 Nov 2012 21:29:36 -0800 (PST) Received: from smtp.gmail.com ([66.114.71.21]) by mx.google.com with ESMTPS id ho6sm3479195qeb.3.2012.11.17.21.29.33 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 17 Nov 2012 21:29:33 -0800 (PST) From: Ethan Glasser-Camp To: Tomi Ollila , Michael Stapelberg , notmuch@notmuchmail.org Subject: Re: [BUG] Saving attachments containing UTF-8 chars In-Reply-To: References: User-Agent: Notmuch/0.14+45~g6ea9330 (http://notmuchmail.org) Emacs/24.1.1 (x86_64-pc-linux-gnu) Date: Sun, 18 Nov 2012 00:29:31 -0500 Message-ID: <87zk2f1nt0.fsf@betacantrips.com> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Nov 2012 05:29:37 -0000 Tomi Ollila writes: > I can verify this bug: I copied 'rawmail' to my mail store and attempted > to 'w' the attacment and got the same result (after notmuch new). > > The saving code first does > notmuch show --format=raw id:"508953E6.70006@gmail.com" > which decodes OK on command line, and to the buffer when > kill-buffer is outcommented in (with-current-notmuch-show-message ...) > macro. I was able to see this behavior, and Tomi did a good job tracking down where it was :) I even see the bytes as presented in the file. When moving point to the problematic character, and doing M-x describe-char, it says: buffer code: #xE2 #x80 #x99 file code: #xE2 #x80 #x99 (encoded by coding system utf-8) buffer-file-coding-system is, of course, utf-8. Writing this buffer using C-x C-w encodes it correctly too. So I think this is an emacs MIME problem. We call mm-save-part, which calls mm-save-part-to-file, which calls mm-with-unibyte-buffer. Hmm.. Indeed, it seems that inserting this character into a file that's been marked "unibyte" using (set-buffer-multibyte nil) turns it into the ^Y character (ASCII code 0x19 -- the character that comes out in the patch file). There's probably a technical reason that this should be true, but I can't think of why that would be. > I attempted a set of trial-&-error tricks to get the attachment > saved "correctly", and at least this seems to do the trick: > > diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el > index f273eb4..a6a85c0 100644 > --- a/emacs/notmuch-show.el > +++ b/emacs/notmuch-show.el > @@ -203,9 +203,11 @@ For example, if you wanted to remove an \"unread\" tag and add a > (let ((id (notmuch-show-get-message-id))) > (let ((buf (generate-new-buffer (concat "*notmuch-msg-" id "*")))) > (with-current-buffer buf > - (call-process notmuch-command nil t nil "show" "--format=raw" id) > - ,@body) > - (kill-buffer buf))))) > + (let ((coding-system-for-read 'no-conversion) > + (coding-system-for-write 'no-conversion)) > + (call-process notmuch-command nil t nil "show" "--format=raw" id) > + ,@body)))))) > +%% (kill-buffer buf))))) [snip] > (kill-buffer is outcommented above for testing purposes) > > To test this this needs to me evaluated and then the functions > using this macro (notmuch-show-save-attachments in this case) > > Smart suggestions for proper fix ? Well, we could limit it just to saving attachments (putting the let around the with-current-notmuch-show-message). That feels like it could be right, because intuitively saving an attachment should be done without any conversions. Or even the above doesn't seem so bad. My vague feeling is that messages should always be ASCII, or at least mm-* will interpret it that way, so decoding them into any other character set might cause problems. Anyone understand character sets? Ethan