From: Austin Clements Date: Sat, 24 Jan 2015 17:10:29 +0000 (+1900) Subject: Re: [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=4749763e164382676df9b8e11fa2e8175ace1b37;p=notmuch-archives.git Re: [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API --- diff --git a/8d/abd781f5725d9611b359bf5d4b0849619a7a0d b/8d/abd781f5725d9611b359bf5d4b0849619a7a0d new file mode 100644 index 000000000..d173d147f --- /dev/null +++ b/8d/abd781f5725d9611b359bf5d4b0849619a7a0d @@ -0,0 +1,115 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 98AD7431FCB + for ; Sat, 24 Jan 2015 09:10:36 -0800 (PST) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: 0.138 +X-Spam-Level: +X-Spam-Status: No, score=0.138 tagged_above=-999 required=5 + tests=[DNS_FROM_AHBL_RHSBL=2.438, RCVD_IN_DNSWL_MED=-2.3] + autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id AgJvQKKZugG9 for ; + Sat, 24 Jan 2015 09:10:32 -0800 (PST) +Received: from outgoing.csail.mit.edu (outgoing.csail.mit.edu [128.30.2.149]) + by olra.theworths.org (Postfix) with ESMTP id CB218431FAE + for ; Sat, 24 Jan 2015 09:10:32 -0800 (PST) +Received: from [104.131.20.129] (helo=awakeningjr) + by outgoing.csail.mit.edu with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) + (Exim 4.72) (envelope-from ) + id 1YF4EA-00045v-PF; Sat, 24 Jan 2015 12:10:30 -0500 +Received: from amthrax by awakeningjr with local (Exim 4.84) + (envelope-from ) + id 1YF4EA-0003V9-0W; Sat, 24 Jan 2015 12:10:30 -0500 +From: Austin Clements +To: David Bremner , notmuch@notmuchmail.org +Subject: Re: [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' + API +In-Reply-To: <8738e8p13v.fsf@maritornes.cs.unb.ca> +References: <1398105468-14317-1-git-send-email-amdragon@mit.edu> + <1398105468-14317-7-git-send-email-amdragon@mit.edu> + <8738e8p13v.fsf@maritornes.cs.unb.ca> +User-Agent: Notmuch/0.18.1+86~gef5e66a (http://notmuchmail.org) Emacs/24.4.1 + (x86_64-pc-linux-gnu) +Date: Sat, 24 Jan 2015 12:10:29 -0500 +Message-ID: <874mrgumt6.fsf@csail.mit.edu> +MIME-Version: 1.0 +Content-Type: text/plain +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sat, 24 Jan 2015 17:10:36 -0000 + +On Fri, 11 Jul 2014, David Bremner wrote: +> Austin Clements writes: +> +>> +This returns the content of the given part as a multibyte Lisp +> +> What does "multibyte" mean here? utf8? current encoding? + +Elisp has two kinds of stings: "unibyte strings" and "multibyte +strings". + + https://www.gnu.org/software/emacs/manual/html_node/elisp/Non_002dASCII-in-Strings.html + +You can think of unibyte strings as binary data; they're just vectors of +bytes without any particular encoding semantics (though when you use a +unibyte string you can endow it with encoding). Multibyte strings, +however, are text; they're vectors of Unicode code points. + +>> +string after performing content transfer decoding and any +>> +necessary charset decoding. It is an error to use this for +>> +non-text/* parts." +>> + (let ((content (plist-get part :content))) +>> + (when (not content) +>> + ;; Use show --format=sexp to fetch decoded content +>> + (let* ((args `("show" "--format=sexp" "--include-html" +>> + ,(format "--part=%s" (plist-get part :id)) +>> + ,@(when process-crypto '("--decrypt")) +>> + ,(notmuch-id-to-query (plist-get msg :id)))) +>> + (npart (apply #'notmuch-call-notmuch-sexp args))) +>> + (setq content (plist-get npart :content)) +>> + (when (not content) +>> + (error "Internal error: No :content from %S" args)))) +>> + content)) +> +> I'm a bit curious at the lack of setting "coding-system-for-read" here. +> Are we assuming the user has their environment set up correctly? Not so +> much a criticism as being nervous about everything coding-system +> related. + +That is interesting. coding-system-for-read should really go in +notmuch-call-notmuch-sexp, but I worry that, while *almost* all strings +the CLI outputs are UTF-8, not quite all of them are. For example, we +output filenames exactly at the OS reports the bytes to us (which is +necessary, in a sense, because POSIX enforces no particular encoding on +file names, but still really unfortunate). + +We could set coding-system-for-read, but a full solution needs more +cooperation from the CLI. Possibly the right answer, at least for the +sexp format, is to do our own UTF-8 to "\uXXXX" escapes for strings that +are known to be UTF-8 and leave the raw bytes for the few that aren't. +Then we would set the coding-system-for-read to 'no-conversion and I +think everything would Just Work. + +That doesn't help for JSON, which is supposed to be all UTF-8 all the +time. I can think of solutions there, but they're all ugly and involve +things like encoding filenames as base64 when they aren't valid UTF-8. + +So... I don't think I'm going to do anything about this at this moment. + +> I didn't see anything else to object to in this patch or the previous +> one.