Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 2B989431FD0 for ; Fri, 16 Sep 2011 03:58:56 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.09 X-Spam-Level: X-Spam-Status: No, score=-0.09 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, T_MIME_NO_TEXT=0.01] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KgbqkEsJfLfz for ; Fri, 16 Sep 2011 03:58:55 -0700 (PDT) Received: from homiemail-a75.g.dreamhost.com (caiajhbdcbbj.dreamhost.com [208.97.132.119]) by olra.theworths.org (Postfix) with ESMTP id 9B70B431FB6 for ; Fri, 16 Sep 2011 03:58:55 -0700 (PDT) Received: from homiemail-a75.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a75.g.dreamhost.com (Postfix) with ESMTP id C89F65EC07E; Fri, 16 Sep 2011 03:58:53 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=SSpaeth.de; h=from:to:cc:subject :in-reply-to:references:date:message-id:mime-version: content-type; q=dns; s=sspaeth.de; b=IqEXMWc+vrQ7rebXJP1srrDE/TN es0SoX2x6owQt83pqXH2da6+FbL8JLaqtuWTkwhxHIHraDLTxECZrJ2FHuw5qfUL JFyXoKEX0m6AiYXUISe1iaWFklzL1dY2Th7RwcxEEt2qTvzaH/3KK5CN+Pq3Dd0e mVlX5X180O3QDI0g= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=SSpaeth.de; h=from:to:cc :subject:in-reply-to:references:date:message-id:mime-version: content-type; s=sspaeth.de; bh=yQfH6MPPUSnJr2YUwkpR2oWTE9s=; b=M GgTxorLRPUjfztwofgvPFcn9pCUSac05NiDFCn3OZiZXy9xNTC/OjCAZxJx9jnwm vqbZgtIS51dCPTRkh8yT22iMQLi+hhU27n2Fx6ukQBNhiuS8FS8AaZh4pS3yqObM ICyB/JmmIqOZqgtt4wRO2+DM6ceptR2s74TTq3tBj8= Received: from spaetzbook.sspaeth.de (unknown [84.55.211.141]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: fax@sspaeth.de) by homiemail-a75.g.dreamhost.com (Postfix) with ESMTPSA id 099C75EC07C; Fri, 16 Sep 2011 03:58:51 -0700 (PDT) Received: by spaetzbook.sspaeth.de (sSMTP sendmail emulation); Fri, 16 Sep 2011 12:58:49 +0200 From: Sebastian Spaeth To: Austin Clements , Martin Owens Subject: Re: Unicode Paths In-Reply-To: References: <1315972539.2201.11.camel@delen> User-Agent: Notmuch/0.7-19-gee4579a (http://notmuchmail.org) Emacs/23.2.1 (x86_64-pc-linux-gnu) Date: Fri, 16 Sep 2011 12:58:49 +0200 Message-ID: <8762ksbvo6.fsf@SSpaeth.de> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" Cc: Notmuch developer list X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Sep 2011 10:58:56 -0000 --=-=-= Content-Transfer-Encoding: quoted-printable On Thu, 15 Sep 2011 13:52:12 -0400, Austin Clements wrot= e: > On Tue, Sep 13, 2011 at 11:55 PM, Martin Owens wrote: > > Hello Again, > > > > I notice in the lib code notmuch_database_open(), > > notmuch_database_create() these functions use const char *path for the > > directory path input. Is this unicode safe? > > > > The python bindings (and ctype docs) seem to suggest using something > > called 'wchar_t *' for accepting unicode but that's for C not C++. > > > > Is this something that should be patched? >=20 > char* is the correct type for paths on POSIX systems. The *meaning* > of those bytes is a more complicated matter and depends on your locale > settings. On old systems it was generally ASCII, on modern systems > it's generally UTF-8, and it can be many other things. However, as a > consequence of UNIX's C heritage, it is *always* terminated with a > NULL byte and cannot contain embedded NULL's. Right, that's what we are doing, passing in utf-8 encoded unicode strings to char*, which should be just fine if that is what the underlying OS uses. > wchar_t is another matter entirely. wchar_t is the type used by C to > represent wide strings internally, which generally (but not > necessarily!) means it stores a Unicode code point. However, this > isn't an encoding, and different compilers can give wchar_t different > meanings, so wchar_t strings aren't generally appropriate for storing > or sharing between processes or with the kernel. Mmh, I remember I attempted to user wchar_t to pass in unicode objects directly and it had failed miserably. Sebastian --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk5zK+kACgkQVYX1jMgnoGJF4ACeLfr8tALkONoR/7EP2MfaAOVE vyYAni93lhVz+va98EK7K9z7I6DN0t9l =5i5Y -----END PGP SIGNATURE----- --=-=-=--