Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id E61966DE1AB3 for ; Sun, 14 Feb 2016 14:33:58 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: 0.007 X-Spam-Level: X-Spam-Status: No, score=0.007 tagged_above=-999 required=5 tests=[AWL=0.108, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tfctEmjFlFY3 for ; Sun, 14 Feb 2016 14:33:55 -0800 (PST) Received: from resqmta-po-05v.sys.comcast.net (resqmta-po-05v.sys.comcast.net [96.114.154.164]) by arlo.cworth.org (Postfix) with ESMTPS id 9FA766DE1A2F for ; Sun, 14 Feb 2016 14:33:55 -0800 (PST) Received: from resomta-po-06v.sys.comcast.net ([96.114.154.230]) by resqmta-po-05v.sys.comcast.net with comcast id JNZj1s0054yXVJQ01NZtMf; Sun, 14 Feb 2016 22:33:53 +0000 Received: from mail.tremily.us ([73.221.72.168]) by resomta-po-06v.sys.comcast.net with comcast id JNZs1s00G3dr3C901NZsTU; Sun, 14 Feb 2016 22:33:53 +0000 Received: by mail.tremily.us (Postfix, from userid 1000) id E81831BB253C; Sun, 14 Feb 2016 14:33:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tremily.us; s=odin; t=1455489232; bh=uWzmLoX+dWgJdQCzrLX0ENn+nZLZ4e4b+5NVJPXu3VA=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=N7yWOTsH6zgbQZE/vTtwsZdERLq+QqLGSa68m93pijeTSPZKjBORVUUHcCFl1sAot 5hU2g0uMwtc1D8AoMB4Dwc54n5GlxOH6X6lwrp8cIFrGID9R3+Oj29j60bsnfDbrI5 tzDFo0cws1Al6C+l+nbcOCpqpQ2OLyfizWP5NX/4= Date: Sun, 14 Feb 2016 14:33:51 -0800 From: "W. Trevor King" To: David Bremner Cc: notmuch@notmuchmail.org Subject: Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe) Message-ID: <20160214223351.GE4265@odin.tremily.us> References: <87oabko293.fsf@zancas.localnet> <20160213223357.GC4265@odin.tremily.us> <87ziv4813v.fsf@zancas.localnet> <20160214063132.GD4265@odin.tremily.us> <87twlbv5vj.fsf@zancas.localnet> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="PHDeMLmKefytWajp" Content-Disposition: inline In-Reply-To: <87twlbv5vj.fsf@zancas.localnet> OpenPGP: id=39A2F3FA2AB17E5D8764F388FC29BDCDF15F5BE8; url=http://tremily.us/pubkey.txt User-Agent: Mutt/1.5.23 (2014-03-12) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1455489233; bh=AmI96bcDE9KQbbJ1Spxj1vrVdkaZ/rAdiaT/BDouZKg=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=qCdo3bB4sHo0UkCWw8A/iMXFuEz7wbLknaL0b2Qz+6sGZfElCsZnbXGR9u60c226J ZFuRQMP10G1hFo8lqx8WmKCPEoz+BhbGDT3WNfotg4NiWmNaBx7YHxIWpjJeBnPpBn MSQ6Q2C9RwXW+bpIcDveWKX25iGOoLJpfY11A/8WTzyBkJFTwborueQz+C74vtuIiz qKhJs7VO3mp75sWtdDIKD9MvthX2jBa/CGUgVx5bTbrr4Jnjn79qERP5CNzw8aS0dm E0KpVziITyezKeXL0b6Mj54ADCSN5WyBZDcqlfUYV8X7OyAeh/w4BNQucI7ag/d6Tt BAV4gEw8os2ag== X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Feb 2016 22:33:59 -0000 --PHDeMLmKefytWajp Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Feb 14, 2016 at 08:22:24AM -0400, David Bremner wrote: > W. Trevor King writes: > > for tag in tags: > > _LOG.debug('building a quoted path for {!r} / {!r}'.format(id, ta= g)) > > path =3D 'tags/{id}/{tag}'.format( > > id=3D_hex_quote(string=3Did), tag=3D_hex_quote(string=3Dtag)) > > yield '{mode} {hash}\t{path}\n'.format(mode=3Dmode, hash=3Dhash, = path=3Dpath) > > >=20 > I think the problem is not a bad tag, but a bad message-id. The last > line of output before the UnicodeWarning and the broken pipe is >=20 > building a quoted path for u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf= 0\xe3\xe5\xe9-\xcf\xca' / u'unread' $ ln -s nmbug nmbug.py $ python2 -W error -c "import nmbug; nmbug._hex_quote(u'D1B4DEBCAFFC4A05A= 4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca')" Traceback (most recent call last): File "", line 1, in File "nmbug.py", line 106, in _hex_quote uppercase_escapes =3D _quote(string, safe) File "/usr/lib64/python2.7/urllib.py", line 1303, in quote return ''.join(map(quoter, s)) UnicodeWarning: Unicode equal comparison failed to convert both arguments= to Unicode - interpreting them as being unequal The problem seems to be having Unicode characters in either quote argument: $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4= 349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca')" =E2=80=A6 UnicodeWarning: Unicode equal comparison failed to convert both arguments= to Unicode - interpreting them as being unequal $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4= 349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca', u'+@=3D:,')" =E2=80=A6 UnicodeWarning: Unicode equal comparison failed to convert both arguments= to Unicode - interpreting them as being unequal $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4= 349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca'.encode('utf-8'), u'+@=3D:,'= )" =E2=80=A6 UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 33: = ordinal not in range(128) $ python2 -W error -c "import urllib; print(urllib.quote(u'D1B4DEBCAFFC4A= 05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca'.encode('utf-8'), u'+@= =3D:,'.encode('utf-8')))" D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@%C3%91%C3%A5%C3%B0%C3%A3%C3%A5%C3%A9-%C3= %8F%C3%8A Related Python issues [1,2,3,4,5]. [2] lead to the currently working Python 3 implementation, which encodes to UTF-8 by default and has an =E2=80=98encoding=E2=80=99 option [6]. There's some useful background in [= 7]. For compatibility with Python 3, I suggest patching _hex_quote to take an encoding option, defaulting to UTF-8, and encoding both strings that are passed to _quote. We should probably raise a ValueError if the length of the encoded safe characters doesn't match the length of the Unicode safe characters, because the caller will probably not expect the byte-level quoting that would cause. Python 3 covers that by restricting the safe characters to ASCII [6], although passing non-ASCII characters with safe doesn't seem to raise an exception: $ python3 -c "from urllib.parse import quote; print(quote('\u0091', '\u00= 91'))" %C2%91 $ python3 -c "from urllib.parse import quote; print(quote('\u203b', '\u20= 3b'))" %E2%80%BB Anyhow, I'll file a patch adding UTF-8 encoding so Python 2 works like Python 3. Cheers, Trevor [1]: http://bugs.python.org/issue2637 [2]: http://bugs.python.org/issue3300 [3]: http://bugs.python.org/issue22231 [4]: http://bugs.python.org/issue23885 [5]: http://bugs.python.org/issue1712522 [6]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote [7]: https://mail.python.org/pipermail/python-dev/2006-July/067335.html --=20 This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy --PHDeMLmKefytWajp Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJWwQDNAAoJEAPqygegUbGspqsP/ikbYEb5/7+y7uncDLolCWGN 25t1gJdZNdAhqmHaUbSqRHKc8yOUkr/MmmNG2RFcKZQaSwwweg3pELAyFiuYljHl n+0da6aizwsguDubMwJes288FkOjSvVeVkrAXQgtMmDCYJXB5/FvHgU9cBBJAqvs FQFZhhrwt4RX9m851ksaomCxJvNW8/zHBQNoVVAEZI6jp2NRYmBriC1xOUlLZ8iF AAkFMisnFnosFH8xbLEN/7qVXRov8LQFF9w7dHqAxFcZu9ML6Byl44Ha2LTfUC1F SNQ+uSD0NaGDhpTYSMG1OE/ODdlQKs8ah5erzq6D1E1CdxyMSDRUYvkFUmGOzd3b v0FfTzLwE9SoEtzu7CP2TvPGyGmqfIaF1y7HwAKCfgl+wDM5ZvO2CtZXcfsqCTOv QySwNZT1aZse6zX3x0utSEyqRoLtqD5DUXFRPr4IiCnhU80/Jdvy+H1OyJmSW/GV 1JUI7tu4AuAgXVuOXGDhkSvCyklFKiJB9Tau4giXD2/l318wlqoHYDPl/LpRFl7t jm5GgPhJ9gxYlGdTunWZRVAV97GsRjEGdERYbL86yGBsj5FayM6PG517/b8ZrJdm TN5onwoRpt2YFT41ORAgJa7yC6khHnPbYKnpEZ9sjyUQLg0AXdUKJoevEi2V9PMF afi06G05r5RwO2ocMNvI =a6LU -----END PGP SIGNATURE----- --PHDeMLmKefytWajp--