From: W. Trevor King Date: Mon, 15 Feb 2016 05:30:11 +0000 (+1600) Subject: [PATCH] nmbug: Allow Unicode tags and IDs in Python 2 X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=4192f8a792fbbbc1ad12228b311efad612508aef;p=notmuch-archives.git [PATCH] nmbug: Allow Unicode tags and IDs in Python 2 --- diff --git a/9e/9bf54a32fc2a08428efb9444cbecc44da30711 b/9e/9bf54a32fc2a08428efb9444cbecc44da30711 new file mode 100644 index 000000000..b1b2bd8a6 --- /dev/null +++ b/9e/9bf54a32fc2a08428efb9444cbecc44da30711 @@ -0,0 +1,143 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by arlo.cworth.org (Postfix) with ESMTP id 181856DE0A87 + for ; Sun, 14 Feb 2016 21:29:11 -0800 (PST) +X-Virus-Scanned: Debian amavisd-new at cworth.org +X-Spam-Flag: NO +X-Spam-Score: 0.057 +X-Spam-Level: +X-Spam-Status: No, score=0.057 tagged_above=-999 required=5 tests=[AWL=0.058, + DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, + SPF_PASS=-0.001] autolearn=disabled +Received: from arlo.cworth.org ([127.0.0.1]) + by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id YwGviKfM47oN for ; + Sun, 14 Feb 2016 21:29:08 -0800 (PST) +Received: from resqmta-po-08v.sys.comcast.net (resqmta-po-08v.sys.comcast.net + [96.114.154.167]) + by arlo.cworth.org (Postfix) with ESMTPS id F3AA26DE0943 + for ; Sun, 14 Feb 2016 21:29:07 -0800 (PST) +Received: from resomta-po-09v.sys.comcast.net ([96.114.154.233]) + by resqmta-po-08v.sys.comcast.net with comcast + id JVUu1s00552QWKC01VV5xF; Mon, 15 Feb 2016 05:29:05 +0000 +Received: from mail.tremily.us ([73.221.72.168]) + by resomta-po-09v.sys.comcast.net with comcast + id JVV31s00K3dr3C901VV4g9; Mon, 15 Feb 2016 05:29:05 +0000 +Received: from ullr.tremily.us (unknown [192.168.10.7]) + by mail.tremily.us (Postfix) with ESMTPS id 13BA61BB2A0D; + Sun, 14 Feb 2016 21:29:03 -0800 (PST) +Received: (nullmailer pid 22232 invoked by uid 1000); + Mon, 15 Feb 2016 05:30:14 -0000 +From: "W. Trevor King" +To: notmuch@notmuchmail.org +Subject: [PATCH] nmbug: Allow Unicode tags and IDs in Python 2 +Date: Sun, 14 Feb 2016 21:30:11 -0800 +Message-Id: + +X-Mailer: git-send-email 2.1.0.60.g85f0837 +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; + s=q20140121; t=1455514145; + bh=JSA6UzUGX/hI8zjlpzXe0gXS06y41p1jTP778uTzK/M=; + h=Received:Received:Received:Received:From:To:Subject:Date: + Message-Id; + b=p/x4GLjW56OLBfGD4zuYDpy+VrFGH6arhZpb4jqVbHWS/O3bUYtwg6uoe0oX2nkDg + 1iTNLUrAR6WJWCiTevS4evair0kNyvidoqEwc6kbLn1+hc2qaCB6khcDbDixrIt2UC + ihgTMKaxqvR/HppE0jLc79gLN2HR7fKHy3RbSGDyAlJx5+wRiUi9hq+Lof9Qf5adkT + ZJNqiUnXtF8HXGjbLSTkX0VemiBYLGHh2meLN+hCcKPXCbFh47XkZXuL/uIzrmoJ2y + T9BrsWTry9Pmk4v6VMv/SgtG9z40IhLhwNIgfMI9OQlubaXg5oTY77xqHYbqVZpX0I + gwwPCDf0XO5Bg== +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.20 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Mon, 15 Feb 2016 05:29:11 -0000 + +Avoid a UnicodeWarning and broken pipe on 'nmbug commit' in Python 2 +when a tag or message ID contains non-ASCII characters [1]. + +There are a number of Python bugs associated with this behavior +[2,3,4,5,6]. There's also some useful background in [8]. [3] lead to +the currently working Python 3 implementation, which encodes to UTF-8 +by default and has 'encoding' and 'errors' arguments [7]. This commit +follows that approach in a way that's compatible with both Python 2 +and Python 3. Coercing to UTF-8 (regardless of locale) gives us +consistent tag IDs for sharing between users. + +The 'isnumeric' check identifies Unicode instances in both Python 2 +[9] and Python 3 [10]. + +[1]: id:87twlbv5vj.fsf@zancas.localnet + http://thread.gmane.org/gmane.mail.notmuch.general/21855/focus=21862 + Subject: Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe) + Date: Sun, 14 Feb 2016 08:22:24 -0400 +[2]: http://bugs.python.org/issue2637 +[3]: http://bugs.python.org/issue3300 +[4]: http://bugs.python.org/issue22231 +[5]: http://bugs.python.org/issue23885 +[6]: http://bugs.python.org/issue1712522 +[7]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote +[8]: https://mail.python.org/pipermail/python-dev/2006-July/067335.html +[9]: https://docs.python.org/2/library/stdtypes.html#unicode.isnumeric +[10]: https://docs.python.org/3/library/stdtypes.html#str.isnumeric +--- +I haven't checked the other commands for issues with Unicode IDs or +tags. It's possible that in addition to this explicit encoding to +UTF-8, we'll also want explicit decoding from UTF-8 when reading from +Git trees (for 'nmbug checkout' and 'nmbug status'). + +Cheers, +Trevor + + devel/nmbug/nmbug | 13 +++++++++++-- + 1 file changed, 11 insertions(+), 2 deletions(-) + +diff --git a/devel/nmbug/nmbug b/devel/nmbug/nmbug +index 81f582c..284d374 100755 +--- a/devel/nmbug/nmbug ++++ b/devel/nmbug/nmbug +@@ -1,6 +1,6 @@ + #!/usr/bin/env python + # +-# Copyright (c) 2011-2014 David Bremner ++# Copyright (c) 2011-2016 David Bremner + # W. Trevor King + # + # This program is free software: you can redistribute it and/or modify +@@ -95,7 +95,7 @@ except AttributeError: # Python < 3.2 + _tempfile.TemporaryDirectory = _TemporaryDirectory + + +-def _hex_quote(string, safe='+@=:,'): ++def _hex_quote(string, safe='+@=:,', encoding='utf-8', errors='strict'): + """ + quote('abc def') -> 'abc%20def'. + +@@ -103,6 +103,15 @@ def _hex_quote(string, safe='+@=:,'): + addition to letters, digits, and '_.-') and lowercase hex digits + (e.g. '%3a' instead of '%3A'). + """ ++ if hasattr(string, 'isnumeric'): ++ string = string.encode(encoding, errors) ++ if hasattr(safe, 'isnumeric'): ++ safe_bytes = safe.encode(encoding, errors) ++ if len(safe_bytes) != len(safe): ++ raise ValueError( ++ 'some safe characters are encoded as multiple bytes ' ++ '({!r} -> {!r})'.format(safe, safe_bytes)) ++ safe = safe_bytes + uppercase_escapes = _quote(string, safe) + return _HEX_ESCAPE_REGEX.sub( + lambda match: match.group(0).lower(), +-- +2.1.0.60.g85f0837 +