From: W. Trevor King Date: Sun, 14 Feb 2016 22:33:51 +0000 (+1600) Subject: Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe) X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=ef6c837d93366d4589b834520ccb658329cd2ada;p=notmuch-archives.git Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe) --- diff --git a/0c/1d32a30923c110cfeef21761757a65522944c2 b/0c/1d32a30923c110cfeef21761757a65522944c2 new file mode 100644 index 000000000..c20353445 --- /dev/null +++ b/0c/1d32a30923c110cfeef21761757a65522944c2 @@ -0,0 +1,200 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by arlo.cworth.org (Postfix) with ESMTP id E61966DE1AB3 + for ; Sun, 14 Feb 2016 14:33:58 -0800 (PST) +X-Virus-Scanned: Debian amavisd-new at cworth.org +X-Spam-Flag: NO +X-Spam-Score: 0.007 +X-Spam-Level: +X-Spam-Status: No, score=0.007 tagged_above=-999 required=5 tests=[AWL=0.108, + DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, + RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled +Received: from arlo.cworth.org ([127.0.0.1]) + by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id tfctEmjFlFY3 for ; + Sun, 14 Feb 2016 14:33:55 -0800 (PST) +Received: from resqmta-po-05v.sys.comcast.net (resqmta-po-05v.sys.comcast.net + [96.114.154.164]) + by arlo.cworth.org (Postfix) with ESMTPS id 9FA766DE1A2F + for ; Sun, 14 Feb 2016 14:33:55 -0800 (PST) +Received: from resomta-po-06v.sys.comcast.net ([96.114.154.230]) + by resqmta-po-05v.sys.comcast.net with comcast + id JNZj1s0054yXVJQ01NZtMf; Sun, 14 Feb 2016 22:33:53 +0000 +Received: from mail.tremily.us ([73.221.72.168]) + by resomta-po-06v.sys.comcast.net with comcast + id JNZs1s00G3dr3C901NZsTU; Sun, 14 Feb 2016 22:33:53 +0000 +Received: by mail.tremily.us (Postfix, from userid 1000) + id E81831BB253C; Sun, 14 Feb 2016 14:33:51 -0800 (PST) +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tremily.us; s=odin; + t=1455489232; bh=uWzmLoX+dWgJdQCzrLX0ENn+nZLZ4e4b+5NVJPXu3VA=; + h=Date:From:To:Cc:Subject:References:In-Reply-To; + b=N7yWOTsH6zgbQZE/vTtwsZdERLq+QqLGSa68m93pijeTSPZKjBORVUUHcCFl1sAot + 5hU2g0uMwtc1D8AoMB4Dwc54n5GlxOH6X6lwrp8cIFrGID9R3+Oj29j60bsnfDbrI5 + tzDFo0cws1Al6C+l+nbcOCpqpQ2OLyfizWP5NX/4= +Date: Sun, 14 Feb 2016 14:33:51 -0800 +From: "W. Trevor King" +To: David Bremner +Cc: notmuch@notmuchmail.org +Subject: Re: problems with nmbug and empty prefix (UnicodeWarning and broken + pipe) +Message-ID: <20160214223351.GE4265@odin.tremily.us> +References: <87oabko293.fsf@zancas.localnet> + <20160213223357.GC4265@odin.tremily.us> + <87ziv4813v.fsf@zancas.localnet> + <20160214063132.GD4265@odin.tremily.us> + <87twlbv5vj.fsf@zancas.localnet> +MIME-Version: 1.0 +Content-Type: multipart/signed; micalg=pgp-sha1; + protocol="application/pgp-signature"; boundary="PHDeMLmKefytWajp" +Content-Disposition: inline +In-Reply-To: <87twlbv5vj.fsf@zancas.localnet> +OpenPGP: id=39A2F3FA2AB17E5D8764F388FC29BDCDF15F5BE8; + url=http://tremily.us/pubkey.txt +User-Agent: Mutt/1.5.23 (2014-03-12) +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; + s=q20140121; t=1455489233; + bh=AmI96bcDE9KQbbJ1Spxj1vrVdkaZ/rAdiaT/BDouZKg=; + h=Received:Received:Received:Date:From:To:Subject:Message-ID: + MIME-Version:Content-Type; + b=qCdo3bB4sHo0UkCWw8A/iMXFuEz7wbLknaL0b2Qz+6sGZfElCsZnbXGR9u60c226J + ZFuRQMP10G1hFo8lqx8WmKCPEoz+BhbGDT3WNfotg4NiWmNaBx7YHxIWpjJeBnPpBn + MSQ6Q2C9RwXW+bpIcDveWKX25iGOoLJpfY11A/8WTzyBkJFTwborueQz+C74vtuIiz + qKhJs7VO3mp75sWtdDIKD9MvthX2jBa/CGUgVx5bTbrr4Jnjn79qERP5CNzw8aS0dm + E0KpVziITyezKeXL0b6Mj54ADCSN5WyBZDcqlfUYV8X7OyAeh/w4BNQucI7ag/d6Tt + BAV4gEw8os2ag== +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.20 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sun, 14 Feb 2016 22:33:59 -0000 + + +--PHDeMLmKefytWajp +Content-Type: text/plain; charset=utf-8 +Content-Disposition: inline +Content-Transfer-Encoding: quoted-printable + +On Sun, Feb 14, 2016 at 08:22:24AM -0400, David Bremner wrote: +> W. Trevor King writes: +> > for tag in tags: +> > _LOG.debug('building a quoted path for {!r} / {!r}'.format(id, ta= +g)) +> > path =3D 'tags/{id}/{tag}'.format( +> > id=3D_hex_quote(string=3Did), tag=3D_hex_quote(string=3Dtag)) +> > yield '{mode} {hash}\t{path}\n'.format(mode=3Dmode, hash=3Dhash, = +path=3Dpath) +> > +>=20 +> I think the problem is not a bad tag, but a bad message-id. The last +> line of output before the UnicodeWarning and the broken pipe is +>=20 +> building a quoted path for u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf= +0\xe3\xe5\xe9-\xcf\xca' / u'unread' + + $ ln -s nmbug nmbug.py + $ python2 -W error -c "import nmbug; nmbug._hex_quote(u'D1B4DEBCAFFC4A05A= +4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca')" + Traceback (most recent call last): + File "", line 1, in + File "nmbug.py", line 106, in _hex_quote + uppercase_escapes =3D _quote(string, safe) + File "/usr/lib64/python2.7/urllib.py", line 1303, in quote + return ''.join(map(quoter, s)) + UnicodeWarning: Unicode equal comparison failed to convert both arguments= + to Unicode - interpreting them as being unequal + +The problem seems to be having Unicode characters in either quote argument: + + $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4= +349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca')" + =E2=80=A6 + UnicodeWarning: Unicode equal comparison failed to convert both arguments= + to Unicode - interpreting them as being unequal + $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4= +349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca', u'+@=3D:,')" + =E2=80=A6 + UnicodeWarning: Unicode equal comparison failed to convert both arguments= + to Unicode - interpreting them as being unequal + $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4= +349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca'.encode('utf-8'), u'+@=3D:,'= +)" + =E2=80=A6 + UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 33: = +ordinal not in range(128) + $ python2 -W error -c "import urllib; print(urllib.quote(u'D1B4DEBCAFFC4A= +05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca'.encode('utf-8'), u'+@= +=3D:,'.encode('utf-8')))" + D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@%C3%91%C3%A5%C3%B0%C3%A3%C3%A5%C3%A9-%C3= +%8F%C3%8A + +Related Python issues [1,2,3,4,5]. [2] lead to the currently working +Python 3 implementation, which encodes to UTF-8 by default and has an +=E2=80=98encoding=E2=80=99 option [6]. There's some useful background in [= +7]. For +compatibility with Python 3, I suggest patching _hex_quote to take an +encoding option, defaulting to UTF-8, and encoding both strings that +are passed to _quote. We should probably raise a ValueError if the +length of the encoded safe characters doesn't match the length of the +Unicode safe characters, because the caller will probably not expect +the byte-level quoting that would cause. Python 3 covers that by +restricting the safe characters to ASCII [6], although passing +non-ASCII characters with safe doesn't seem to raise an exception: + + $ python3 -c "from urllib.parse import quote; print(quote('\u0091', '\u00= +91'))" + %C2%91 + $ python3 -c "from urllib.parse import quote; print(quote('\u203b', '\u20= +3b'))" + %E2%80%BB + +Anyhow, I'll file a patch adding UTF-8 encoding so Python 2 works like +Python 3. + +Cheers, +Trevor + +[1]: http://bugs.python.org/issue2637 +[2]: http://bugs.python.org/issue3300 +[3]: http://bugs.python.org/issue22231 +[4]: http://bugs.python.org/issue23885 +[5]: http://bugs.python.org/issue1712522 +[6]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote +[7]: https://mail.python.org/pipermail/python-dev/2006-July/067335.html + +--=20 +This email may be signed or encrypted with GnuPG (http://www.gnupg.org). +For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy + +--PHDeMLmKefytWajp +Content-Type: application/pgp-signature; name="signature.asc" +Content-Description: OpenPGP digital signature + +-----BEGIN PGP SIGNATURE----- +Version: GnuPG v2 + +iQIcBAEBAgAGBQJWwQDNAAoJEAPqygegUbGspqsP/ikbYEb5/7+y7uncDLolCWGN +25t1gJdZNdAhqmHaUbSqRHKc8yOUkr/MmmNG2RFcKZQaSwwweg3pELAyFiuYljHl +n+0da6aizwsguDubMwJes288FkOjSvVeVkrAXQgtMmDCYJXB5/FvHgU9cBBJAqvs +FQFZhhrwt4RX9m851ksaomCxJvNW8/zHBQNoVVAEZI6jp2NRYmBriC1xOUlLZ8iF +AAkFMisnFnosFH8xbLEN/7qVXRov8LQFF9w7dHqAxFcZu9ML6Byl44Ha2LTfUC1F +SNQ+uSD0NaGDhpTYSMG1OE/ODdlQKs8ah5erzq6D1E1CdxyMSDRUYvkFUmGOzd3b +v0FfTzLwE9SoEtzu7CP2TvPGyGmqfIaF1y7HwAKCfgl+wDM5ZvO2CtZXcfsqCTOv +QySwNZT1aZse6zX3x0utSEyqRoLtqD5DUXFRPr4IiCnhU80/Jdvy+H1OyJmSW/GV +1JUI7tu4AuAgXVuOXGDhkSvCyklFKiJB9Tau4giXD2/l318wlqoHYDPl/LpRFl7t +jm5GgPhJ9gxYlGdTunWZRVAV97GsRjEGdERYbL86yGBsj5FayM6PG517/b8ZrJdm +TN5onwoRpt2YFT41ORAgJa7yC6khHnPbYKnpEZ9sjyUQLg0AXdUKJoevEi2V9PMF +afi06G05r5RwO2ocMNvI +=a6LU +-----END PGP SIGNATURE----- + +--PHDeMLmKefytWajp--