From: W. Trevor King Date: Thu, 13 Feb 2014 16:47:23 +0000 (+1600) Subject: [PATCH v3 8/8] nmbug-status: Hardcode UTF-8 instead of using the user's locale X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=1bc9e0b901ce7a01360342ca8068a415838728f7;p=notmuch-archives.git [PATCH v3 8/8] nmbug-status: Hardcode UTF-8 instead of using the user's locale --- diff --git a/14/5b00b41a1813274717e500e279c7305ece7df3 b/14/5b00b41a1813274717e500e279c7305ece7df3 new file mode 100644 index 000000000..59dc39cdf --- /dev/null +++ b/14/5b00b41a1813274717e500e279c7305ece7df3 @@ -0,0 +1,127 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id E33A8431FCB + for ; Thu, 13 Feb 2014 08:48:58 -0800 (PST) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Amavis-Alert: BAD HEADER SECTION, Duplicate header field: "References" +X-Spam-Flag: NO +X-Spam-Score: 0 +X-Spam-Level: +X-Spam-Status: No, score=0 tagged_above=-999 required=5 + tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001] + autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id DRAApRm8OSXR for ; + Thu, 13 Feb 2014 08:48:52 -0800 (PST) +Received: from qmta09.westchester.pa.mail.comcast.net + (qmta09.westchester.pa.mail.comcast.net [76.96.62.96]) + by olra.theworths.org (Postfix) with ESMTP id 8A68F431FBD + for ; Thu, 13 Feb 2014 08:48:49 -0800 (PST) +Received: from omta21.westchester.pa.mail.comcast.net ([76.96.62.72]) + by qmta09.westchester.pa.mail.comcast.net with comcast + id RpLE1n0011ZXKqc59sooWL; Thu, 13 Feb 2014 16:48:48 +0000 +Received: from odin.tremily.us ([24.18.63.50]) + by omta21.westchester.pa.mail.comcast.net with comcast + id Rson1n00Q152l3L3hsooaA; Thu, 13 Feb 2014 16:48:48 +0000 +Received: from mjolnir.tremily.us (unknown [192.168.0.140]) + by odin.tremily.us (Postfix) with ESMTPS id 3C538102DA0E; + Thu, 13 Feb 2014 08:48:47 -0800 (PST) +Received: (nullmailer pid 17997 invoked by uid 1000); + Thu, 13 Feb 2014 16:47:29 -0000 +From: "W. Trevor King" +To: notmuch@notmuchmail.org +Subject: [PATCH v3 8/8] nmbug-status: Hardcode UTF-8 instead of using the + user's locale +Date: Thu, 13 Feb 2014 08:47:23 -0800 +Message-Id: + <4f771a7537f9dd345b7c26331beb5cfa5908d3cb.1392309570.git.wking@tremily.us> +X-Mailer: git-send-email 1.8.5.2.8.g0f6c0d1 +In-Reply-To: +References: +In-Reply-To: +References: +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; + s=q20121106; t=1392310128; + bh=pLGkmQOHxrFMu738rOKiRDIxfTTcEzB7h5yFkPcHlyk=; + h=Received:Received:Received:Received:From:To:Subject:Date: + Message-Id; + b=d9LrfX9PmVKuHifOGzDliNawJs/Z/asaHdEBZHBG5xYGeK0RTh4p4kjkvw589/Vfn + eIh/+sxXOOsA3yZSV3DzVjgsWDBeDt1encB8VLzvDuxbq/Z9Jfp13dI8aYbCthmbTH + hCfMvcKKI/lKVSPEkgWUK9VULeOq5rvqcgsWedGOqszN3XMy5UUIwz8kvYBWnL0myZ + AXurkOMVf/MEsNy7hsfCn5b/gRijwPCX8lqwzbjRyYdADcJ4ITQ5GrupO68MoH2PEZ + ujn8fN2Z4tErRhESq7gxuRQAcSLXtT3DDoyaH1A3h64TG6/1Uf247kk1BAstsycH0/ + vywSwq6McYVBQ== +Cc: Tomi Ollila +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Thu, 13 Feb 2014 16:48:59 -0000 + +David [1] and Tomi [2] both feel that the user's choice of LANG is not +explicit enough to have such a strong effect on nmbug-status. For +example, cron jobs usually default to LANG=C, and that is going to +give you ASCII output: + + $ LANG=C python -c 'import locale; print(locale.getpreferredencoding())' + ANSI_X3.4-1968 + +Trying to print Unicode author names (and other strings) in that +encoding would crash nmbug-status with a UnicodeEncodeError. To avoid +that, this patch hardcodes UTF-8, which can handle generic Unicode, +and is the preferred encoding (regardless of LANG settings) for +everyone who has chimed in on the list so far. I'd prefer trusting +LANG, but in the absence of any users that prefer non-UTF-8 encodings +I'm fine with this approach. + +While we could achieve the same effect on the output content by +dropping the previous patch (nmbug-status: Encode output using the +user's locale), Tomi also wanted UTF-8 hardcoded as the config-file +encoding [2]. Keeping the output encoding patch and then adding this +to hardcode both the config-file and output encodings at once seems +the easiest route, now that fd29d3f (nmbug-status: Decode Popen output +using the user's locale, 2014-02-10) has landed in master. + +[1]: id="877g8z4v4x.fsf@zancas.localnet" + http://article.gmane.org/gmane.mail.notmuch.general/17202 +[2]: id="m2vbwj79lu.fsf@guru.guru-group.fi" + http://article.gmane.org/gmane.mail.notmuch.general/17209 +--- + devel/nmbug/nmbug-status | 3 +-- + 1 file changed, 1 insertion(+), 2 deletions(-) + +diff --git a/devel/nmbug/nmbug-status b/devel/nmbug/nmbug-status +index 8cc097a..beb2af5 100755 +--- a/devel/nmbug/nmbug-status ++++ b/devel/nmbug/nmbug-status +@@ -13,7 +13,6 @@ import codecs + import collections + import datetime + import email.utils +-import locale + try: # Python 3 + from urllib.parse import quote + except ImportError: # Python 2 +@@ -27,7 +26,7 @@ import subprocess + import xml.sax.saxutils + + +-_ENCODING = locale.getpreferredencoding() or sys.getdefaultencoding() ++_ENCODING = 'UTF-8' + _PAGES = {} + + +-- +1.8.5.2.8.g0f6c0d1 +