--- /dev/null
+Return-Path: <dkg@fifthhorseman.net>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by arlo.cworth.org (Postfix) with ESMTP id 4F6B86DE13E7\r
+ for <notmuch@notmuchmail.org>; Tue, 16 Feb 2016 11:02:28 -0800 (PST)\r
+X-Virus-Scanned: Debian amavisd-new at cworth.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.016\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.016 tagged_above=-999 required=5\r
+ tests=[AWL=-0.016] autolearn=disabled\r
+Received: from arlo.cworth.org ([127.0.0.1])\r
+ by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id l2PoT-sIX-7c for <notmuch@notmuchmail.org>;\r
+ Tue, 16 Feb 2016 11:02:25 -0800 (PST)\r
+Received: from che.mayfirst.org (che.mayfirst.org [209.234.253.108])\r
+ by arlo.cworth.org (Postfix) with ESMTP id CE4166DE0244\r
+ for <notmuch@notmuchmail.org>; Tue, 16 Feb 2016 11:02:24 -0800 (PST)\r
+Received: from fifthhorseman.net (unknown [38.109.115.130])\r
+ by che.mayfirst.org (Postfix) with ESMTPSA id 58295F991;\r
+ Tue, 16 Feb 2016 14:02:03 -0500 (EST)\r
+Received: by fifthhorseman.net (Postfix, from userid 1000)\r
+ id 9E1671FF32; Tue, 16 Feb 2016 14:02:02 -0500 (EST)\r
+From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>\r
+To: David Bremner <david@tethera.net>, notmuch@notmuchmail.org\r
+Subject: Re: encoding of message-ids\r
+In-Reply-To: <87si0svnim.fsf@zancas.localnet>\r
+References: <87si0svnim.fsf@zancas.localnet>\r
+User-Agent: Notmuch/0.21+72~gd8c4f1c (http://notmuchmail.org) Emacs/24.5.1\r
+ (x86_64-pc-linux-gnu)\r
+Date: Tue, 16 Feb 2016 14:02:02 -0500\r
+Message-ID: <87ziv0iimt.fsf@alice.fifthhorseman.net>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.20\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <https://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch/>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <https://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Tue, 16 Feb 2016 19:02:28 -0000\r
+\r
+On Tue 2016-02-16 07:38:09 -0500, David Bremner wrote:\r
+> I spent a little time this morning staring at the code, and it seems\r
+> that all of the message-ids are parsed via g_mime_decode_text, which\r
+> deals with RFC2047 encodings and makes guesses at decoding 8bit\r
+> characters. In practice this means that in the notmuch database all\r
+> headers are UTF-8. Since message-id's are supposed to be printable ascii\r
+> [at least in rfc5322], this seems like not such a terrible decision, but\r
+> I wonder if we should document this potential conversion somewhere?\r
+\r
+i think you mean g_mime_utils_header_decode_text, not gmime_decode_text,\r
+right?\r
+\r
+What do you think are the potential risks here?\r
+\r
+ * if all incoming message-ids are standards-compliant (lower-case\r
+ ascii, with an @ sign in the middle and surrounded by angle-brackets\r
+ [0], then it cannot be interpreted as RFC 2047 text because it does\r
+ not have the leading =? or the trailing ?=, so gmime shouldn't\r
+ translate it.\r
+\r
+ * if some incoming message-ids are not standards-compliant, then it's\r
+ possible that they will be transformed into other,\r
+ non-standards-compliant message IDs. Some of them might even be\r
+ transformed into standards-compliant message-IDs. for example,\r
+ '=?UTF-8?q?<abc@example.net>?=' will be transformed into\r
+ '<abc@example.net>'.\r
+\r
+the main risk, i suppose, is that someone could craft a message with a\r
+different literal Message-ID than an existing message, and could trigger\r
+an otherwise undetectable message ID collision. This seems not much\r
+worse than the existing (detectable) mesage ID collision problems\r
+notmuch already has.\r
+\r
+That said, RFC 2047 suggest that its encodings are only relevant in\r
+places where a "text" token would be used. Message-ID (and References\r
+and In-Reply-To) are intended to only contain dot-atom-text tokens. So\r
+probably it would be more correct to avoid applying to these specific\r
+fields.\r
+\r
+i dunno that it's a big deal though, given the analysis above.\r
+\r
+ --dkg\r
+\r
+[0] https://tools.ietf.org/html/rfc5322#section-3.6.4\r
+[1] https://tools.ietf.org/html/rfc2047#section-5\r