Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 98AD7431FCB for ; Sat, 24 Jan 2015 09:10:36 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0.138 X-Spam-Level: X-Spam-Status: No, score=0.138 tagged_above=-999 required=5 tests=[DNS_FROM_AHBL_RHSBL=2.438, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AgJvQKKZugG9 for ; Sat, 24 Jan 2015 09:10:32 -0800 (PST) Received: from outgoing.csail.mit.edu (outgoing.csail.mit.edu [128.30.2.149]) by olra.theworths.org (Postfix) with ESMTP id CB218431FAE for ; Sat, 24 Jan 2015 09:10:32 -0800 (PST) Received: from [104.131.20.129] (helo=awakeningjr) by outgoing.csail.mit.edu with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1YF4EA-00045v-PF; Sat, 24 Jan 2015 12:10:30 -0500 Received: from amthrax by awakeningjr with local (Exim 4.84) (envelope-from ) id 1YF4EA-0003V9-0W; Sat, 24 Jan 2015 12:10:30 -0500 From: Austin Clements To: David Bremner , notmuch@notmuchmail.org Subject: Re: [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API In-Reply-To: <8738e8p13v.fsf@maritornes.cs.unb.ca> References: <1398105468-14317-1-git-send-email-amdragon@mit.edu> <1398105468-14317-7-git-send-email-amdragon@mit.edu> <8738e8p13v.fsf@maritornes.cs.unb.ca> User-Agent: Notmuch/0.18.1+86~gef5e66a (http://notmuchmail.org) Emacs/24.4.1 (x86_64-pc-linux-gnu) Date: Sat, 24 Jan 2015 12:10:29 -0500 Message-ID: <874mrgumt6.fsf@csail.mit.edu> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Jan 2015 17:10:36 -0000 On Fri, 11 Jul 2014, David Bremner wrote: > Austin Clements writes: > >> +This returns the content of the given part as a multibyte Lisp > > What does "multibyte" mean here? utf8? current encoding? Elisp has two kinds of stings: "unibyte strings" and "multibyte strings". https://www.gnu.org/software/emacs/manual/html_node/elisp/Non_002dASCII-in-Strings.html You can think of unibyte strings as binary data; they're just vectors of bytes without any particular encoding semantics (though when you use a unibyte string you can endow it with encoding). Multibyte strings, however, are text; they're vectors of Unicode code points. >> +string after performing content transfer decoding and any >> +necessary charset decoding. It is an error to use this for >> +non-text/* parts." >> + (let ((content (plist-get part :content))) >> + (when (not content) >> + ;; Use show --format=sexp to fetch decoded content >> + (let* ((args `("show" "--format=sexp" "--include-html" >> + ,(format "--part=%s" (plist-get part :id)) >> + ,@(when process-crypto '("--decrypt")) >> + ,(notmuch-id-to-query (plist-get msg :id)))) >> + (npart (apply #'notmuch-call-notmuch-sexp args))) >> + (setq content (plist-get npart :content)) >> + (when (not content) >> + (error "Internal error: No :content from %S" args)))) >> + content)) > > I'm a bit curious at the lack of setting "coding-system-for-read" here. > Are we assuming the user has their environment set up correctly? Not so > much a criticism as being nervous about everything coding-system > related. That is interesting. coding-system-for-read should really go in notmuch-call-notmuch-sexp, but I worry that, while *almost* all strings the CLI outputs are UTF-8, not quite all of them are. For example, we output filenames exactly at the OS reports the bytes to us (which is necessary, in a sense, because POSIX enforces no particular encoding on file names, but still really unfortunate). We could set coding-system-for-read, but a full solution needs more cooperation from the CLI. Possibly the right answer, at least for the sexp format, is to do our own UTF-8 to "\uXXXX" escapes for strings that are known to be UTF-8 and leave the raw bytes for the few that aren't. Then we would set the coding-system-for-read to 'no-conversion and I think everything would Just Work. That doesn't help for JSON, which is supposed to be all UTF-8 all the time. I can think of solutions there, but they're all ugly and involve things like encoding filenames as base64 when they aren't valid UTF-8. So... I don't think I'm going to do anything about this at this moment. > I didn't see anything else to object to in this patch or the previous > one.