From: David Bremner Date: Tue, 16 Feb 2016 13:04:07 +0000 (+2000) Subject: Re: [PATCH] nmbug: Allow Unicode tags and IDs in Python 2 X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=8268a5b90ed1055b4b7a0f7cd78e229e32c612d5;p=notmuch-archives.git Re: [PATCH] nmbug: Allow Unicode tags and IDs in Python 2 --- diff --git a/f4/e2b9332a8addd14432d3357039d6fe80711adc b/f4/e2b9332a8addd14432d3357039d6fe80711adc new file mode 100644 index 000000000..e1fd87d56 --- /dev/null +++ b/f4/e2b9332a8addd14432d3357039d6fe80711adc @@ -0,0 +1,106 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by arlo.cworth.org (Postfix) with ESMTP id 440C26DE141B + for ; Tue, 16 Feb 2016 05:04:12 -0800 (PST) +X-Virus-Scanned: Debian amavisd-new at cworth.org +X-Spam-Flag: NO +X-Spam-Score: -0.307 +X-Spam-Level: +X-Spam-Status: No, score=-0.307 tagged_above=-999 required=5 tests=[AWL=0.244, + RP_MATCHES_RCVD=-0.55, SPF_PASS=-0.001] autolearn=disabled +Received: from arlo.cworth.org ([127.0.0.1]) + by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id s3vrzFIj2p0b for ; + Tue, 16 Feb 2016 05:04:10 -0800 (PST) +Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) + by arlo.cworth.org (Postfix) with ESMTPS id 1669C6DE02C9 + for ; Tue, 16 Feb 2016 05:04:09 -0800 (PST) +Received: from remotemail by fethera.tethera.net with local (Exim 4.84) + (envelope-from ) + id 1aVfHq-0002cb-Ri; Tue, 16 Feb 2016 08:03:26 -0500 +Received: (nullmailer pid 25980 invoked by uid 1000); + Tue, 16 Feb 2016 13:04:07 -0000 +From: David Bremner +To: "W. Trevor King" , notmuch@notmuchmail.org +Subject: Re: [PATCH] nmbug: Allow Unicode tags and IDs in Python 2 +In-Reply-To: + +References: + +User-Agent: Notmuch/0.21+26~g9404723 (http://notmuchmail.org) Emacs/24.5.1 + (x86_64-pc-linux-gnu) +Date: Tue, 16 Feb 2016 09:04:07 -0400 +Message-ID: <87lh6kvmbc.fsf@zancas.localnet> +MIME-Version: 1.0 +Content-Type: text/plain; charset=utf-8 +Content-Transfer-Encoding: quoted-printable +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.20 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Tue, 16 Feb 2016 13:04:12 -0000 + +"W. Trevor King" writes: + +> Avoid a UnicodeWarning and broken pipe on 'nmbug commit' in Python 2 +> when a tag or message ID contains non-ASCII characters [1]. +> +> There are a number of Python bugs associated with this behavior +> [2,3,4,5,6]. There's also some useful background in [8]. [3] lead to +> the currently working Python 3 implementation, which encodes to UTF-8 +> by default and has 'encoding' and 'errors' arguments [7]. This commit +> follows that approach in a way that's compatible with both Python 2 +> and Python 3. Coercing to UTF-8 (regardless of locale) gives us +> consistent tag IDs for sharing between users. + +I'm not sure what "tag IDs" are. Do you mean message-ids here? or "tags +and IDs"? + +At first I thought there might be problems with non-utf8 message-ids, +but that turns out not to be the case [1]. It seems like it would take +a fairly heroic effort to get non-UTF8 tags into the database (perhaps +by calling the library interface with bad strings?) so we can probably +ignore this case. It might be good to document the limitation though, +since AFAIK, dump and restore can roundtrip any old crap. + + +> +> The 'isnumeric' check identifies Unicode instances in both Python 2 +> [9] and Python 3 [10]. +> + +I still haven't really tried to understand this part, but probably it +deserves inline documentation. + +> --- +> I haven't checked the other commands for issues with Unicode IDs or +> tags. It's possible that in addition to this explicit encoding to +> UTF-8, we'll also want explicit decoding from UTF-8 when reading from +> Git trees (for 'nmbug checkout' and 'nmbug status'). + +Yes, this seems to be a problem, with the patch applied I can commit, +but the same utf-8 message-id causes problems. + +bremner@zancas:~/software/upstream/notmuch$ ./devel/nmbug/nmbug status +U D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@=C3=83=C3=82=C3=83=C3=82=C2=A5=C3=83=C3= +=82=C2=B0=C3=83=C3=82=C2=A3=C3=83=C3=82=C2=A5=C3=83=C3=82=C2=A9-=C3=83=C3= +=82=C3=83=C3=82 unread +A D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@=C3=83=C3=83=C2=A5=C3=83=C2=B0=C3=83=C2= +=A3=C3=83=C2=A5=C3=83=C2=A9-=C3=83=C3=83 unread + +bremner@zancas:~/software/upstream/notmuch$ delve -a -1 ~/Maildir/.notmuch/= +xapian | grep D1B4DEBCAFFC4A05A4D4349A6EC5C9D8 +QD1B4DEBCAFFC4A05A4D4349A6EC5C9D8@=C3=91=C3=A5=C3=B0=C3=A3=C3=A5=C3=A9-=C3= +=8F=C3=8A + +[1]: id:87si0svnim.fsf@zancas.localnet