Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 1BECA429E26 for ; Tue, 12 Jul 2011 13:27:42 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.799 X-Spam-Level: X-Spam-Status: No, score=-0.799 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0yk8njA5LMDJ for ; Tue, 12 Jul 2011 13:27:40 -0700 (PDT) Received: from mail-wy0-f181.google.com (mail-wy0-f181.google.com [74.125.82.181]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 66347431FB6 for ; Tue, 12 Jul 2011 13:27:40 -0700 (PDT) Received: by wyh22 with SMTP id 22so3950968wyh.26 for ; Tue, 12 Jul 2011 13:27:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=IxBDl5UgD2x2/A9IzqOxtoO3sKtRrB5nUoUZD3+JJA8=; b=eGNE6kQnF/TlPTYTLVMZsC1PUqppZ4PFoedCyvHBvKTXGxLjnDHo/EmxjC3nHFZaEh qn9cKDQjKq+gK8bwf8l+K9+ZtGg+KbpzJZyGIIWcKeE55q25wNMZhwm+aVy6EvCn0Wbo aGOxpnXlpwcwlMyhmfmPQwLYRRYrTCaE4kS4s= Received: by 10.227.199.82 with SMTP id er18mr295154wbb.63.1310502457671; Tue, 12 Jul 2011 13:27:37 -0700 (PDT) Received: from brick.lan (cpc1-sgyl2-0-0-cust47.sgyl.cable.virginmedia.com [80.192.18.48]) by mx.google.com with ESMTPS id ex2sm11128623wbb.31.2011.07.12.13.27.34 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 12 Jul 2011 13:27:35 -0700 (PDT) Date: Tue, 12 Jul 2011 21:27:31 +0100 From: Patrick Totzke To: Carl Worth Subject: Re: Encodings Message-ID: <20110712202731.GA28929@brick.lan> References: <87zkkkx6am.fsf@SSpaeth.de> <87box0lv05.fsf@yoom.home.cworth.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="JYK4vJDZwFMowpUq" Content-Disposition: inline In-Reply-To: <87box0lv05.fsf@yoom.home.cworth.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Notmuch developer list X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jul 2011 20:27:42 -0000 --JYK4vJDZwFMowpUq Content-Type: multipart/mixed; boundary="T4sUOijqQbZv57TR" Content-Disposition: inline --T4sUOijqQbZv57TR Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi! As discussed on irc, if notmuch stores header values in utf8, its safe to decode them to unicode instances here. best, /p On Mon, Jul 11, 2011 at 08:03:38AM -0700, Carl Worth wrote: > On Mon, 11 Jul 2011 16:04:17 +0200, Sebastian Spaeth wrote: > > The answer is that things are very implicit. notmuch.h speaks of > > strings but never mentions encodings >=20 > Much of this was intentional on my part. >=20 > For example, I intentionally avoided restrictions on what could be > stored as a tag in the database, (other than the terminating character > implied by "string" of course). >=20 > > So, can be document what encoding we are expected to pass in the various > > APIs >=20 > Yes, let's clarify documentation wherever we need to. >=20 > > For some of the stuff we read directly from the files, eg > > arbitrary headers, we can probably be least sure >=20 > The headers should be decoded to utf-8, (via > g_mime_utils_header_decode_text), before being stored in the database. >=20 > > but are e.g. the returned tags always utf-8? >=20 > No. The tag data is returned exactly as the user presented it. >=20 > > I would love to make the python bindings use unicode() instances in > > cases where we can be sure to actually receive utf-8 encoded strings. > >=20 > > Encodings make my brain hurt. Unfortunately one cannot simply ignore > > them. >=20 > I think a lot of the pain here is due to some bad design decisions in > python itself. Of course, my saying that doesn't make things any easier > for you. >=20 > But do tell me what more we can do to clarify behavior or documentation. >=20 > -Carl >=20 > --=20 > carl.d.worth@intel.com > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch --T4sUOijqQbZv57TR Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="0001-unicode-return-value-for-Message.get_header.patch" Content-Transfer-Encoding: quoted-printable =46rom 988a9832d714dfa0f91b2b1185a50acb4a6ca4b5 Mon Sep 17 00:00:00 2001 =46rom: pazz Date: Tue, 12 Jul 2011 19:47:39 +0100 Subject: [PATCH 1/8] unicode return value for Message.get_header() As discussed in IRC, notmuch recodes mailheaders to utf-8, so we can safely decode them into unicode instances. --- bindings/python/notmuch/message.py | 8 +++++--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/bindings/python/notmuch/message.py b/bindings/python/notmuch/m= essage.py index 763d2c6..4a43a88 100644 --- a/bindings/python/notmuch/message.py +++ b/bindings/python/notmuch/message.py @@ -379,14 +379,16 @@ class Message(object): =20 :param header: The name of the header to be retrieved. It is not case-sensitive (TODO: confirm). - :type header: str - :returns: The header value as string + :type header: str or unicode instance + :returns: The header value as a unicode string :exception: :exc:`NotmuchError` =20 * STATUS.NOT_INITIALIZED if the message=20 is not initialized. * STATUS.NULL_POINTER, if no header was found """ + if isinstance(header, unicode): + header =3D header.encode('utf-8') if self._msg is None: raise NotmuchError(STATUS.NOT_INITIALIZED) =20 @@ -394,7 +396,7 @@ class Message(object): header =3D Message._get_header (self._msg, header) if header =3D=3D None: raise NotmuchError(STATUS.NULL_POINTER) - return header + return header.decode('utf-8') =20 def get_filename(self): """Returns the file path of the message file --=20 1.7.4.1 --T4sUOijqQbZv57TR-- --JYK4vJDZwFMowpUq Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk4crjMACgkQlDQDZ9fWxarMuQCgjRjhtJGBfVZJIEA4M1f1tS7o 5mEAn1Yp9YhNPIo5N6GZWNiZ53xlyKC2 =XmCG -----END PGP SIGNATURE----- --JYK4vJDZwFMowpUq--