--- /dev/null
+Return-Path: <rlb@defaultvalue.org>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by arlo.cworth.org (Postfix) with ESMTP id 9C42D6DE1AEA\r
+ for <notmuch@notmuchmail.org>; Sat, 5 Sep 2015 12:13:05 -0700 (PDT)\r
+X-Virus-Scanned: Debian amavisd-new at cworth.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: 0.393\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=0.393 tagged_above=-999 required=5 tests=[AWL=0.199, \r
+ RP_MATCHES_RCVD=-0.55, URIBL_SBL=0.644, URIBL_SBL_A=0.1]\r
+ autolearn=disabled\r
+Received: from arlo.cworth.org ([127.0.0.1])\r
+ by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id QAJhby6f67TD for <notmuch@notmuchmail.org>;\r
+ Sat, 5 Sep 2015 12:13:03 -0700 (PDT)\r
+Received: from defaultvalue.org (defaultvalue.org [70.85.129.156])\r
+ by arlo.cworth.org (Postfix) with ESMTP id 607B76DE1AE9\r
+ for <notmuch@notmuchmail.org>; Sat, 5 Sep 2015 12:13:03 -0700 (PDT)\r
+Received: from trouble.defaultvalue.org (localhost [127.0.0.1])\r
+ (Authenticated sender: rlb@defaultvalue.org)\r
+ by defaultvalue.org (Postfix) with ESMTPSA id 2FB7621FD2;\r
+ Sat, 5 Sep 2015 14:13:01 -0500 (CDT)\r
+Received: by trouble.defaultvalue.org (Postfix, from userid 1000)\r
+ id A08B714E070; Sat, 5 Sep 2015 14:13:00 -0500 (CDT)\r
+From: Rob Browning <rlb@defaultvalue.org>\r
+To: David Bremner <david@tethera.net>, notmuch@notmuchmail.org\r
+Subject: Re: [PATCH 1/1] Store and search for canonical Unicode text [WIP]\r
+In-Reply-To: <87io7sw79j.fsf@trouble.defaultvalue.org>\r
+References: <1440951676-17286-1-git-send-email-rlb@defaultvalue.org>\r
+ <87fv2we26p.fsf@maritornes.cs.unb.ca>\r
+ <87io7sw79j.fsf@trouble.defaultvalue.org>\r
+User-Agent: Notmuch/0.20.1 (http://notmuchmail.org) Emacs/24.5.1\r
+ (x86_64-pc-linux-gnu)\r
+Date: Sat, 05 Sep 2015 14:13:00 -0500\r
+Message-ID: <877fo4wugz.fsf@trouble.defaultvalue.org>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain; charset=utf-8\r
+Content-Transfer-Encoding: quoted-printable\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.18\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch/>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Sat, 05 Sep 2015 19:13:05 -0000\r
+\r
+Rob Browning <rlb@defaultvalue.org> writes:\r
+\r
+> David Bremner <david@tethera.net> writes:\r
+\r
+>> It seems plausible to specify UTF-8 input for the library, but what\r
+>> about the CLI? It seems like the canonicalization operation increases\r
+>> the chance of mangling user input in non-UTF-8 locales.\r
+>\r
+> Yes, the key question: what does notmuch intend? i.e. given a sequence\r
+> of bytes, how will notmuch interpret them? I think we should decide\r
+> that, and document it clearly somewhere.\r
+>\r
+> The commit message describes my understanding of how things currently\r
+> work, and if/when I get time, I'd like to propose some related\r
+> documentation updates (perhaps to notmuch-search-terms or\r
+> notmuch-insert/new?).\r
+>\r
+> Oh, and if I do understand things correctly, notmuch may already stand a\r
+> chance of mangling any bytes that aren't an invalid UTF-8 byte sequence,\r
+> but also aren't actually in UTF-8 (excepting encodings that are a strict\r
+> subset of UTF-8, like ASCII).\r
+>\r
+> For example (if I did this right), [0xd1 0xa1] is valid UTF-8, producing\r
+> omega "=D1=A1", and also valid Latin-1, producing "=C3=91=C2=A1".\r
+\r
+So on this particular point, I'm perhaps too used to thinking about the\r
+general encoding problem, and wasn't thinking about our specific\r
+constraints.\r
+\r
+If (1) "normal" message bodies are required to be US-ASCII (which I'd\r
+neglected to remember might be the case), and (2) MIME handles the rest,\r
+then perhaps notmuch will only receive raw bytes via user input\r
+(i.e. query strings, etc.).\r
+\r
+In which case, we could just document that notmuch interprets user input\r
+as UTF-8 (and we might or might not mention the Latin-1 fallback).\r
+\r
+Later locale support could be added if desired, and none of this would\r
+involve the quite nasty problem of encoding detection.\r
+\r
+--=20\r
+Rob Browning\r
+rlb @defaultvalue.org and @debian.org\r
+GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A\r
+GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4\r