From 2d458668a87bd3cb09fb7a920a1dfb9b6b55b921 Mon Sep 17 00:00:00 2001 From: "W. Trevor King" Date: Mon, 30 Mar 2015 22:26:51 +1700 Subject: [PATCH] Re: UnicodeDecodeError with python API --- 82/57d37d103d50ed05a17c9276e986b85cd64070 | 149 ++++++++++++++++++++++ 1 file changed, 149 insertions(+) create mode 100644 82/57d37d103d50ed05a17c9276e986b85cd64070 diff --git a/82/57d37d103d50ed05a17c9276e986b85cd64070 b/82/57d37d103d50ed05a17c9276e986b85cd64070 new file mode 100644 index 000000000..1c46dffde --- /dev/null +++ b/82/57d37d103d50ed05a17c9276e986b85cd64070 @@ -0,0 +1,149 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 79437431FBF + for ; Sun, 29 Mar 2015 22:29:02 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: 2.338 +X-Spam-Level: ** +X-Spam-Status: No, score=2.338 tagged_above=-999 required=5 + tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, + DNS_FROM_AHBL_RHSBL=2.438, RCVD_IN_DNSWL_NONE=-0.0001] + autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id tmuG3yIFMfl1 for ; + Sun, 29 Mar 2015 22:28:56 -0700 (PDT) +Received: from resqmta-po-05v.sys.comcast.net (resqmta-po-05v.sys.comcast.net + [96.114.154.164]) + (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id AF391431FAE + for ; Sun, 29 Mar 2015 22:28:55 -0700 (PDT) +Received: from resomta-po-01v.sys.comcast.net ([96.114.154.225]) + by resqmta-po-05v.sys.comcast.net with comcast + id 9hUu1q0014s37d401hUuw4; Mon, 30 Mar 2015 05:28:54 +0000 +Received: from odin.tremily.us ([67.168.81.176]) + by resomta-po-01v.sys.comcast.net with comcast + id 9hSs1q00A3oF5yT01hSsVG; Mon, 30 Mar 2015 05:26:54 +0000 +Received: by odin.tremily.us (Postfix, from userid 1000) + id EC07E16F8071; Sun, 29 Mar 2015 22:26:51 -0700 (PDT) +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tremily.us; s=odin; + t=1427693211; bh=Jp/untkSO9O2zjt3gTGzIm69gcEKhGrC/KVtFAygSqY=; + h=Date:From:To:Cc:Subject:References:In-Reply-To; + b=V1wiJ5Vm0J1kkqDscyfp5v34UciutSH9FMW1lKguUTF8z3c2oRZPyJIpYGVUAas18 + 68xXcQrBrQ24ZPFSa1LndQiyQ1/Zj+/kg8AyTfwuOUdlmy6Qa3nT7/qwhRZNi8AdP5 + XMoKP092vX0tOqtBR7sFhfljD0fZW3OIR7no3EQA= +Date: Sun, 29 Mar 2015 22:26:51 -0700 +From: "W. Trevor King" +To: Sebastian Fischmeister +Subject: Re: UnicodeDecodeError with python API +Message-ID: <20150330052651.GS22036@odin.tremily.us> +References: <874mp4q7e7.fsf@uwaterloo.ca> + <20150329163658.GK22036@odin.tremily.us> + <87ego7pfia.fsf@uwaterloo.ca> +MIME-Version: 1.0 +Content-Type: multipart/signed; micalg=pgp-sha1; + protocol="application/pgp-signature"; boundary="vSzmLLrdioqyxIBH" +Content-Disposition: inline +In-Reply-To: <87ego7pfia.fsf@uwaterloo.ca> +OpenPGP: id=39A2F3FA2AB17E5D8764F388FC29BDCDF15F5BE8; + url=http://tremily.us/pubkey.txt +User-Agent: Mutt/1.5.23 (2014-03-12) +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; + s=q20140121; t=1427693334; + bh=ZLlalfBAUD2HcSb1rb6a9XSYwlKCk342wNtpu3Oh9jM=; + h=Received:Received:Received:Date:From:To:Subject:Message-ID: + MIME-Version:Content-Type; + b=Vk0Zx+blGwRXlFoLN9FG+kw6m3YT+GG7Xdq0R9yefDvp3g8QZtHiz+m/RqrSrbq5P + FT56oT+KJnVgi0xG5pKTdl7DkvVwbP0PiKFD0Kan+PawR4Y2ABNZx/K+I6e/jUKGn0 + EF6dZmgKEg185XPo8Gm2mXVQr2+Qmy+tdJUrSU+h9nchQHgMGwTIJ1c7t1QSjesr/H + twdKzu079jt1RAhEz18TLd8ErSU3l2IK92ix3zQXEbRcWzjdD99Id6TiTumM2O0znZ + 6jUxeKYUCmCIwiDWwdzx6xvyIzGPxOiPzJVC9Tq648tyTBgr44G0N0GsvHq3v2SHwW + sF5vhcBMxmIlA== +Cc: notmuch +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Mon, 30 Mar 2015 05:29:04 -0000 + + +--vSzmLLrdioqyxIBH +Content-Type: text/plain; charset=utf-8 +Content-Disposition: inline +Content-Transfer-Encoding: quoted-printable + +On Sun, Mar 29, 2015 at 07:10:53PM -0400, Sebastian Fischmeister wrote: +> > My first guess is that the file's encoding doesn't match your +> > locale. Do you have a non-ASCII locale set? You can check with: +>=20 +> It seems to be more tricky than I thought. I didn't have a locale set. +>=20 +> When I set one, I can parse some emails with this: +>=20 +> export LANG=3Den_US.latin-1 +>=20 +> Others with this: +>=20 +> export LANG=3Den_US.UTF-8 +>=20 +> Others fail with either of the two. + +Hmm, that's surprising. In hindsight, the locale should only be +affecting the *output* (e.g., a non-Unicode locale might cause a +UnicodeEncodeError). However, you're getting your errors on input. +I'd expect the files to be loaded and parsed as byte-streams, but +maybe there's a bug in Python's email parser. It wouldn't be the +first time it's had trouble with bytes-vs-Unicode (see these old bugs +with similar tracebacks from the initial transition to 3.0 [1,2], or +search =E2=80=9Cunicode email=E2=80=9D on http://bugs.python.org/). I'd tr= +y to +reproduce this failure by calling email.message_from_file(=E2=80=A6) direct= +ly +(getting notmuch out of the loop), and then file a bug against Python +once you have a pure-Python reproduction. + +Cheers, +Trevor + +[1]: http://bugs.python.org/issue1086 +[2]: http://bugs.python.org/issue1258#msg56470 + +--=20 +This email may be signed or encrypted with GnuPG (http://www.gnupg.org). +For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy + +--vSzmLLrdioqyxIBH +Content-Type: application/pgp-signature; name="signature.asc" +Content-Description: OpenPGP digital signature + +-----BEGIN PGP SIGNATURE----- +Version: GnuPG v2 + +iQIcBAEBAgAGBQJVGN6ZAAoJEG8/JgBt8ol81h0P/2Gcb33fy+HnXdaZTa4fWAg7 +ylyHLfa0UUv3MasF92a4MB6jnEMcQhgKfnAr0Tp3z/N0R2UXk6n6upetTwr5pLk1 +zyJX5MSk2aEedyjh3n9lid0z75sGjqygEsTppUCTkidLeFQzWFB4j2tdH+xKVcOU +/HXmgcmDlh+HtOu1DRSIEETQxio1LiPBqUXFKZ/FGDlornAlFbBViX1XZaudj+9A +/LeyD92SYkRiDaUvKpogFBpM0pClxSUzezXfIVRho18nwft5tmmatGgIWLMrntsD +iaBLAqJcjUPgYuhcxtVAC4JAk+L2IWaJr4HuufN1UfLu5Uj5IwUvaIjFevkblMkl +0RQ2Hf8IhueN59d+QtbGbRiWHJBbf6PfBDXHukkeQSvrYjkwiJi2hoiLnJ4OhhDn +3wkVKyIp0fGZwsq1xTySFqlqd8rTGOG9vhnHYYDEurr5+AXYARH6/33MoprVjrhc +gdtSZJfPhvj1mv2ilTBWOsVOV9/ar3qOMD3dhjqhQvxprghknUf63y7L7e1FwFKx +Uj9LA1tbI2wxiX4enWSUeYxkjQU8bDwHaOmxBl7OUPiOtq7zNnGnfLBFSqJ/NVUM +f7fZlBajQkoMuwuYgcPnwGp0C//RUNpIreCqSbzB9hi6D/hY5Wqas8ax3Pm4jrx4 +a8akA1PFycrAaIj79HTV +=lybv +-----END PGP SIGNATURE----- + +--vSzmLLrdioqyxIBH-- -- 2.26.2