Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 06CD9431FD0 for ; Mon, 11 Jul 2011 07:04:23 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.09 X-Spam-Level: X-Spam-Status: No, score=-0.09 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, T_MIME_NO_TEXT=0.01] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p3PCmF4JR4Jz for ; Mon, 11 Jul 2011 07:04:22 -0700 (PDT) Received: from homiemail-a23.g.dreamhost.com (caiajhbdcagg.dreamhost.com [208.97.132.66]) by olra.theworths.org (Postfix) with ESMTP id 6BCF8431FB6 for ; Mon, 11 Jul 2011 07:04:22 -0700 (PDT) Received: from homiemail-a23.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a23.g.dreamhost.com (Postfix) with ESMTP id DDBA74B007C for ; Mon, 11 Jul 2011 07:04:21 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=SSpaeth.de; h=from:to:subject :date:message-id:mime-version:content-type; q=dns; s=sspaeth.de; b=YEu/u/nPXmYfPzla3/gcBXPLf/8V/nj2oXgR2hpDyb+DzBqLc6/G1NmhkmVTM 90T1R9IFLEpj7rBTIw/uRWVoqBCYOtKL98g/YSArX3ahqNFqAGfOV6HgBlTNdCAq XIJ4vm6JpVMb2Igx0ZB+FgH/LT+m7QDAdyeo32+omc01NM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=SSpaeth.de; h=from:to :subject:date:message-id:mime-version:content-type; s=sspaeth.de ; bh=swcpRGl/tL3sgwdxW+OSa2e9VH4=; b=bxBSUMHXTrkP9CTtHO2nEcStVHI xU2IT38P6BgH/WstYxgYo1/EFYx/mz+QAcrEwf9nhJLUhoHKP9ArOqs4GFR9pd+d ij1m330O7LH1G6rVjNB2m2IeSCstdd/TPze76ItydzwAOw6wQcv4kGTtVV6X6eWE qkS0fAQxWYPDl6MU= Received: from spaetzbook.sspaeth.de (mtec-hg-docking-1-dhcp-253.ethz.ch [129.132.133.253]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: fax@sspaeth.de) by homiemail-a23.g.dreamhost.com (Postfix) with ESMTPSA id 83E104B006D for ; Mon, 11 Jul 2011 07:04:20 -0700 (PDT) Received: by spaetzbook.sspaeth.de (sSMTP sendmail emulation); Mon, 11 Jul 2011 16:04:18 +0200 From: Sebastian Spaeth To: Notmuch developer list Subject: Encodings User-Agent: Notmuch/0.5-233-gb404931 (http://notmuchmail.org) Emacs/23.2.1 (x86_64-pc-linux-gnu) Date: Mon, 11 Jul 2011 16:04:17 +0200 Message-ID: <87zkkkx6am.fsf@SSpaeth.de> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jul 2011 14:04:23 -0000 --=-=-= Hi all, after I was notified about how notmuch's python bindings perform differently depending on whether we hand it (byte-based) ASCII strings or unicode, I tried to disentangle what encodings to expect and send it to. The answer is that things are very implicit. notmuch.h speaks of strings but never mentions encodings, xapian docs don't mention encodings but ojwb confirmed that it expects utf-8. So, can be document what encoding we are expected to pass in the various APIs and where we can guarantee to actually return UTF-8 encoded strings? For some of the stuff we read directly from the files, eg arbitrary headers, we can probably be least sure, but are e.g. the returned tags always utf-8? I would love to make the python bindings use unicode() instances in cases where we can be sure to actually receive utf-8 encoded strings. Encodings make my brain hurt. Unfortunately one cannot simply ignore them. Sebastian --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk4bAuIACgkQVYX1jMgnoGIIpgCgho4j/M9KGQWPXCbQL3GBf0mJ hdUAoIBEDWloS3Lfs1lNKau3MuT34apG =u5cR -----END PGP SIGNATURE----- --=-=-=--