Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe)
authorW. Trevor King <wking@tremily.us>
Sun, 14 Feb 2016 22:33:51 +0000 (14:33 +1600)
committerW. Trevor King <wking@tremily.us>
Sat, 20 Aug 2016 23:21:08 +0000 (16:21 -0700)
0c/1d32a30923c110cfeef21761757a65522944c2 [new file with mode: 0644]

diff --git a/0c/1d32a30923c110cfeef21761757a65522944c2 b/0c/1d32a30923c110cfeef21761757a65522944c2
new file mode 100644 (file)
index 0000000..c203534
--- /dev/null
@@ -0,0 +1,200 @@
+Return-Path: <wking@tremily.us>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by arlo.cworth.org (Postfix) with ESMTP id E61966DE1AB3\r
+ for <notmuch@notmuchmail.org>; Sun, 14 Feb 2016 14:33:58 -0800 (PST)\r
+X-Virus-Scanned: Debian amavisd-new at cworth.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: 0.007\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=0.007 tagged_above=-999 required=5 tests=[AWL=0.108, \r
+ DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,\r
+ RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled\r
+Received: from arlo.cworth.org ([127.0.0.1])\r
+ by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id tfctEmjFlFY3 for <notmuch@notmuchmail.org>;\r
+ Sun, 14 Feb 2016 14:33:55 -0800 (PST)\r
+Received: from resqmta-po-05v.sys.comcast.net (resqmta-po-05v.sys.comcast.net\r
+ [96.114.154.164])\r
+ by arlo.cworth.org (Postfix) with ESMTPS id 9FA766DE1A2F\r
+ for <notmuch@notmuchmail.org>; Sun, 14 Feb 2016 14:33:55 -0800 (PST)\r
+Received: from resomta-po-06v.sys.comcast.net ([96.114.154.230])\r
+ by resqmta-po-05v.sys.comcast.net with comcast\r
+ id JNZj1s0054yXVJQ01NZtMf; Sun, 14 Feb 2016 22:33:53 +0000\r
+Received: from mail.tremily.us ([73.221.72.168])\r
+ by resomta-po-06v.sys.comcast.net with comcast\r
+ id JNZs1s00G3dr3C901NZsTU; Sun, 14 Feb 2016 22:33:53 +0000\r
+Received: by mail.tremily.us (Postfix, from userid 1000)\r
+ id E81831BB253C; Sun, 14 Feb 2016 14:33:51 -0800 (PST)\r
+DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tremily.us; s=odin;\r
+ t=1455489232; bh=uWzmLoX+dWgJdQCzrLX0ENn+nZLZ4e4b+5NVJPXu3VA=;\r
+ h=Date:From:To:Cc:Subject:References:In-Reply-To;\r
+ b=N7yWOTsH6zgbQZE/vTtwsZdERLq+QqLGSa68m93pijeTSPZKjBORVUUHcCFl1sAot\r
+ 5hU2g0uMwtc1D8AoMB4Dwc54n5GlxOH6X6lwrp8cIFrGID9R3+Oj29j60bsnfDbrI5\r
+ tzDFo0cws1Al6C+l+nbcOCpqpQ2OLyfizWP5NX/4=\r
+Date: Sun, 14 Feb 2016 14:33:51 -0800\r
+From: "W. Trevor King" <wking@tremily.us>\r
+To: David Bremner <david@tethera.net>\r
+Cc: notmuch@notmuchmail.org\r
+Subject: Re: problems with nmbug and empty prefix (UnicodeWarning and broken\r
+ pipe)\r
+Message-ID: <20160214223351.GE4265@odin.tremily.us>\r
+References: <87oabko293.fsf@zancas.localnet>\r
+ <20160213223357.GC4265@odin.tremily.us>\r
+ <87ziv4813v.fsf@zancas.localnet>\r
+ <20160214063132.GD4265@odin.tremily.us>\r
+ <87twlbv5vj.fsf@zancas.localnet>\r
+MIME-Version: 1.0\r
+Content-Type: multipart/signed; micalg=pgp-sha1;\r
+ protocol="application/pgp-signature"; boundary="PHDeMLmKefytWajp"\r
+Content-Disposition: inline\r
+In-Reply-To: <87twlbv5vj.fsf@zancas.localnet>\r
+OpenPGP: id=39A2F3FA2AB17E5D8764F388FC29BDCDF15F5BE8;\r
+ url=http://tremily.us/pubkey.txt\r
+User-Agent: Mutt/1.5.23 (2014-03-12)\r
+DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net;\r
+ s=q20140121; t=1455489233;\r
+ bh=AmI96bcDE9KQbbJ1Spxj1vrVdkaZ/rAdiaT/BDouZKg=;\r
+ h=Received:Received:Received:Date:From:To:Subject:Message-ID:\r
+ MIME-Version:Content-Type;\r
+ b=qCdo3bB4sHo0UkCWw8A/iMXFuEz7wbLknaL0b2Qz+6sGZfElCsZnbXGR9u60c226J\r
+ ZFuRQMP10G1hFo8lqx8WmKCPEoz+BhbGDT3WNfotg4NiWmNaBx7YHxIWpjJeBnPpBn\r
+ MSQ6Q2C9RwXW+bpIcDveWKX25iGOoLJpfY11A/8WTzyBkJFTwborueQz+C74vtuIiz\r
+ qKhJs7VO3mp75sWtdDIKD9MvthX2jBa/CGUgVx5bTbrr4Jnjn79qERP5CNzw8aS0dm\r
+ E0KpVziITyezKeXL0b6Mj54ADCSN5WyBZDcqlfUYV8X7OyAeh/w4BNQucI7ag/d6Tt\r
+ BAV4gEw8os2ag==\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.20\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <https://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch/>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <https://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Sun, 14 Feb 2016 22:33:59 -0000\r
+\r
+\r
+--PHDeMLmKefytWajp\r
+Content-Type: text/plain; charset=utf-8\r
+Content-Disposition: inline\r
+Content-Transfer-Encoding: quoted-printable\r
+\r
+On Sun, Feb 14, 2016 at 08:22:24AM -0400, David Bremner wrote:\r
+> W. Trevor King writes:\r
+> >   for tag in tags:\r
+> >       _LOG.debug('building a quoted path for {!r} / {!r}'.format(id, ta=\r
+g))\r
+> >       path =3D 'tags/{id}/{tag}'.format(\r
+> >           id=3D_hex_quote(string=3Did), tag=3D_hex_quote(string=3Dtag))\r
+> >       yield '{mode} {hash}\t{path}\n'.format(mode=3Dmode, hash=3Dhash, =\r
+path=3Dpath)\r
+> >\r
+>=20\r
+> I think the problem is not a bad tag, but a bad message-id. The last\r
+> line of output before the UnicodeWarning and the broken pipe is\r
+>=20\r
+> building a quoted path for u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf=\r
+0\xe3\xe5\xe9-\xcf\xca' / u'unread'\r
+\r
+  $ ln -s nmbug nmbug.py\r
+  $ python2 -W error -c "import nmbug; nmbug._hex_quote(u'D1B4DEBCAFFC4A05A=\r
+4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca')"\r
+  Traceback (most recent call last):\r
+    File "<string>", line 1, in <module>\r
+    File "nmbug.py", line 106, in _hex_quote\r
+      uppercase_escapes =3D _quote(string, safe)\r
+    File "/usr/lib64/python2.7/urllib.py", line 1303, in quote\r
+      return ''.join(map(quoter, s))\r
+  UnicodeWarning: Unicode equal comparison failed to convert both arguments=\r
+ to Unicode - interpreting them as being unequal\r
+\r
+The problem seems to be having Unicode characters in either quote argument:\r
+\r
+  $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4=\r
+349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca')"\r
+  =E2=80=A6\r
+  UnicodeWarning: Unicode equal comparison failed to convert both arguments=\r
+ to Unicode - interpreting them as being unequal\r
+  $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4=\r
+349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca', u'+@=3D:,')"\r
+  =E2=80=A6\r
+  UnicodeWarning: Unicode equal comparison failed to convert both arguments=\r
+ to Unicode - interpreting them as being unequal\r
+  $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4=\r
+349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca'.encode('utf-8'), u'+@=3D:,'=\r
+)"\r
+  =E2=80=A6\r
+  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 33: =\r
+ordinal not in range(128)\r
+  $ python2 -W error -c "import urllib; print(urllib.quote(u'D1B4DEBCAFFC4A=\r
+05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca'.encode('utf-8'), u'+@=\r
+=3D:,'.encode('utf-8')))"\r
+  D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@%C3%91%C3%A5%C3%B0%C3%A3%C3%A5%C3%A9-%C3=\r
+%8F%C3%8A\r
+\r
+Related Python issues [1,2,3,4,5].  [2] lead to the currently working\r
+Python 3 implementation, which encodes to UTF-8 by default and has an\r
+=E2=80=98encoding=E2=80=99 option [6].  There's some useful background in [=\r
+7].  For\r
+compatibility with Python 3, I suggest patching _hex_quote to take an\r
+encoding option, defaulting to UTF-8, and encoding both strings that\r
+are passed to _quote.  We should probably raise a ValueError if the\r
+length of the encoded safe characters doesn't match the length of the\r
+Unicode safe characters, because the caller will probably not expect\r
+the byte-level quoting that would cause.  Python 3 covers that by\r
+restricting the safe characters to ASCII [6], although passing\r
+non-ASCII characters with safe doesn't seem to raise an exception:\r
+\r
+  $ python3 -c "from urllib.parse import quote; print(quote('\u0091', '\u00=\r
+91'))"\r
+  %C2%91\r
+  $ python3 -c "from urllib.parse import quote; print(quote('\u203b', '\u20=\r
+3b'))"\r
+  %E2%80%BB\r
+\r
+Anyhow, I'll file a patch adding UTF-8 encoding so Python 2 works like\r
+Python 3.\r
+\r
+Cheers,\r
+Trevor\r
+\r
+[1]: http://bugs.python.org/issue2637\r
+[2]: http://bugs.python.org/issue3300\r
+[3]: http://bugs.python.org/issue22231\r
+[4]: http://bugs.python.org/issue23885\r
+[5]: http://bugs.python.org/issue1712522\r
+[6]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote\r
+[7]: https://mail.python.org/pipermail/python-dev/2006-July/067335.html\r
+\r
+--=20\r
+This email may be signed or encrypted with GnuPG (http://www.gnupg.org).\r
+For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy\r
+\r
+--PHDeMLmKefytWajp\r
+Content-Type: application/pgp-signature; name="signature.asc"\r
+Content-Description: OpenPGP digital signature\r
+\r
+-----BEGIN PGP SIGNATURE-----\r
+Version: GnuPG v2\r
+\r
+iQIcBAEBAgAGBQJWwQDNAAoJEAPqygegUbGspqsP/ikbYEb5/7+y7uncDLolCWGN\r
+25t1gJdZNdAhqmHaUbSqRHKc8yOUkr/MmmNG2RFcKZQaSwwweg3pELAyFiuYljHl\r
+n+0da6aizwsguDubMwJes288FkOjSvVeVkrAXQgtMmDCYJXB5/FvHgU9cBBJAqvs\r
+FQFZhhrwt4RX9m851ksaomCxJvNW8/zHBQNoVVAEZI6jp2NRYmBriC1xOUlLZ8iF\r
+AAkFMisnFnosFH8xbLEN/7qVXRov8LQFF9w7dHqAxFcZu9ML6Byl44Ha2LTfUC1F\r
+SNQ+uSD0NaGDhpTYSMG1OE/ODdlQKs8ah5erzq6D1E1CdxyMSDRUYvkFUmGOzd3b\r
+v0FfTzLwE9SoEtzu7CP2TvPGyGmqfIaF1y7HwAKCfgl+wDM5ZvO2CtZXcfsqCTOv\r
+QySwNZT1aZse6zX3x0utSEyqRoLtqD5DUXFRPr4IiCnhU80/Jdvy+H1OyJmSW/GV\r
+1JUI7tu4AuAgXVuOXGDhkSvCyklFKiJB9Tau4giXD2/l318wlqoHYDPl/LpRFl7t\r
+jm5GgPhJ9gxYlGdTunWZRVAV97GsRjEGdERYbL86yGBsj5FayM6PG517/b8ZrJdm\r
+TN5onwoRpt2YFT41ORAgJa7yC6khHnPbYKnpEZ9sjyUQLg0AXdUKJoevEi2V9PMF\r
+afi06G05r5RwO2ocMNvI\r
+=a6LU\r
+-----END PGP SIGNATURE-----\r
+\r
+--PHDeMLmKefytWajp--\r