From 0cc1d85f16a69ad19af8e6e8328a810408c28a1f Mon Sep 17 00:00:00 2001 From: Daniel Kahn Gillmor Date: Wed, 17 Feb 2016 14:02:02 +1900 Subject: [PATCH] Re: encoding of message-ids --- 4e/b26e380cc99a315de69fe1134f0d5fdb6f6b3b | 94 +++++++++++++++++++++++ 1 file changed, 94 insertions(+) create mode 100644 4e/b26e380cc99a315de69fe1134f0d5fdb6f6b3b diff --git a/4e/b26e380cc99a315de69fe1134f0d5fdb6f6b3b b/4e/b26e380cc99a315de69fe1134f0d5fdb6f6b3b new file mode 100644 index 000000000..5bc1366ab --- /dev/null +++ b/4e/b26e380cc99a315de69fe1134f0d5fdb6f6b3b @@ -0,0 +1,94 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by arlo.cworth.org (Postfix) with ESMTP id 4F6B86DE13E7 + for ; Tue, 16 Feb 2016 11:02:28 -0800 (PST) +X-Virus-Scanned: Debian amavisd-new at cworth.org +X-Spam-Flag: NO +X-Spam-Score: -0.016 +X-Spam-Level: +X-Spam-Status: No, score=-0.016 tagged_above=-999 required=5 + tests=[AWL=-0.016] autolearn=disabled +Received: from arlo.cworth.org ([127.0.0.1]) + by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id l2PoT-sIX-7c for ; + Tue, 16 Feb 2016 11:02:25 -0800 (PST) +Received: from che.mayfirst.org (che.mayfirst.org [209.234.253.108]) + by arlo.cworth.org (Postfix) with ESMTP id CE4166DE0244 + for ; Tue, 16 Feb 2016 11:02:24 -0800 (PST) +Received: from fifthhorseman.net (unknown [38.109.115.130]) + by che.mayfirst.org (Postfix) with ESMTPSA id 58295F991; + Tue, 16 Feb 2016 14:02:03 -0500 (EST) +Received: by fifthhorseman.net (Postfix, from userid 1000) + id 9E1671FF32; Tue, 16 Feb 2016 14:02:02 -0500 (EST) +From: Daniel Kahn Gillmor +To: David Bremner , notmuch@notmuchmail.org +Subject: Re: encoding of message-ids +In-Reply-To: <87si0svnim.fsf@zancas.localnet> +References: <87si0svnim.fsf@zancas.localnet> +User-Agent: Notmuch/0.21+72~gd8c4f1c (http://notmuchmail.org) Emacs/24.5.1 + (x86_64-pc-linux-gnu) +Date: Tue, 16 Feb 2016 14:02:02 -0500 +Message-ID: <87ziv0iimt.fsf@alice.fifthhorseman.net> +MIME-Version: 1.0 +Content-Type: text/plain +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.20 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Tue, 16 Feb 2016 19:02:28 -0000 + +On Tue 2016-02-16 07:38:09 -0500, David Bremner wrote: +> I spent a little time this morning staring at the code, and it seems +> that all of the message-ids are parsed via g_mime_decode_text, which +> deals with RFC2047 encodings and makes guesses at decoding 8bit +> characters. In practice this means that in the notmuch database all +> headers are UTF-8. Since message-id's are supposed to be printable ascii +> [at least in rfc5322], this seems like not such a terrible decision, but +> I wonder if we should document this potential conversion somewhere? + +i think you mean g_mime_utils_header_decode_text, not gmime_decode_text, +right? + +What do you think are the potential risks here? + + * if all incoming message-ids are standards-compliant (lower-case + ascii, with an @ sign in the middle and surrounded by angle-brackets + [0], then it cannot be interpreted as RFC 2047 text because it does + not have the leading =? or the trailing ?=, so gmime shouldn't + translate it. + + * if some incoming message-ids are not standards-compliant, then it's + possible that they will be transformed into other, + non-standards-compliant message IDs. Some of them might even be + transformed into standards-compliant message-IDs. for example, + '=?UTF-8?q??=' will be transformed into + ''. + +the main risk, i suppose, is that someone could craft a message with a +different literal Message-ID than an existing message, and could trigger +an otherwise undetectable message ID collision. This seems not much +worse than the existing (detectable) mesage ID collision problems +notmuch already has. + +That said, RFC 2047 suggest that its encodings are only relevant in +places where a "text" token would be used. Message-ID (and References +and In-Reply-To) are intended to only contain dot-atom-text tokens. So +probably it would be more correct to avoid applying to these specific +fields. + +i dunno that it's a big deal though, given the analysis above. + + --dkg + +[0] https://tools.ietf.org/html/rfc5322#section-3.6.4 +[1] https://tools.ietf.org/html/rfc2047#section-5 -- 2.26.2