1 Return-Path: <amdragon@gmail.com>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 96018431FD0
\r
6 for <notmuch@notmuchmail.org>; Thu, 15 Sep 2011 10:52:13 -0700 (PDT)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=-0.699 tagged_above=-999 required=5
\r
12 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001,
\r
13 RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled
\r
14 Received: from olra.theworths.org ([127.0.0.1])
\r
15 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
16 with ESMTP id GIbpU4CVgaz2 for <notmuch@notmuchmail.org>;
\r
17 Thu, 15 Sep 2011 10:52:13 -0700 (PDT)
\r
18 Received: from mail-qw0-f46.google.com (mail-qw0-f46.google.com
\r
19 [209.85.216.46]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
\r
20 (No client certificate requested)
\r
21 by olra.theworths.org (Postfix) with ESMTPS id 19AD3431FB6
\r
22 for <notmuch@notmuchmail.org>; Thu, 15 Sep 2011 10:52:13 -0700 (PDT)
\r
23 Received: by qwj8 with SMTP id 8so1130479qwj.5
\r
24 for <notmuch@notmuchmail.org>; Thu, 15 Sep 2011 10:52:12 -0700 (PDT)
\r
25 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
\r
26 h=mime-version:sender:in-reply-to:references:date
\r
27 :x-google-sender-auth:message-id:subject:from:to:cc:content-type;
\r
28 bh=9uJgo4R9NEYEKKJOJ+iVYNX5ZOPJimDcGFpC/8fglJ0=;
\r
29 b=B5FTNgwfk2kwkaf2/34JA9R6TnZCybbKrvXFkYqnlTiqMqYzz/4TxrI5cVes6pOzRe
\r
30 UIiJRBFAldiLx+Mz6ZNJQJ1tHFkrgYXaQf2gSz6M1aBT98rUU853RMS0qAa7UYKhxtzX
\r
31 MIm0OT2l4liJyDj23UYQbC9NDd3gYwQu4nMnk=
\r
33 Received: by 10.229.73.25 with SMTP id o25mr1003542qcj.26.1316109132545; Thu,
\r
34 15 Sep 2011 10:52:12 -0700 (PDT)
\r
35 Sender: amdragon@gmail.com
\r
36 Received: by 10.229.2.201 with HTTP; Thu, 15 Sep 2011 10:52:12 -0700 (PDT)
\r
37 In-Reply-To: <1315972539.2201.11.camel@delen>
\r
38 References: <1315972539.2201.11.camel@delen>
\r
39 Date: Thu, 15 Sep 2011 13:52:12 -0400
\r
40 X-Google-Sender-Auth: mDAFz-C1lMrSYO4Mc494mUL0g_Q
\r
42 <CAH-f9WtL4Lwrf2qSzpgeLL5nA_2_mFxUm6cFLmfO9UK_aKmCkg@mail.gmail.com>
\r
43 Subject: Re: Unicode Paths
\r
44 From: Austin Clements <amdragon@mit.edu>
\r
45 To: Martin Owens <doctormo@gmail.com>
\r
46 Content-Type: text/plain; charset=ISO-8859-1
\r
47 Cc: Notmuch developer list <notmuch@notmuchmail.org>
\r
48 X-BeenThere: notmuch@notmuchmail.org
\r
49 X-Mailman-Version: 2.1.13
\r
51 List-Id: "Use and development of the notmuch mail system."
\r
52 <notmuch.notmuchmail.org>
\r
53 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
54 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
55 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
56 List-Post: <mailto:notmuch@notmuchmail.org>
\r
57 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
58 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
59 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
60 X-List-Received-Date: Thu, 15 Sep 2011 17:52:13 -0000
\r
62 On Tue, Sep 13, 2011 at 11:55 PM, Martin Owens <doctormo@gmail.com> wrote:
\r
65 > I notice in the lib code notmuch_database_open(),
\r
66 > notmuch_database_create() these functions use const char *path for the
\r
67 > directory path input. Is this unicode safe?
\r
69 > The python bindings (and ctype docs) seem to suggest using something
\r
70 > called 'wchar_t *' for accepting unicode but that's for C not C++.
\r
72 > Is this something that should be patched?
\r
74 char* is the correct type for paths on POSIX systems. The *meaning*
\r
75 of those bytes is a more complicated matter and depends on your locale
\r
76 settings. On old systems it was generally ASCII, on modern systems
\r
77 it's generally UTF-8, and it can be many other things. However, as a
\r
78 consequence of UNIX's C heritage, it is *always* terminated with a
\r
79 NULL byte and cannot contain embedded NULL's. Any encoding that
\r
80 doesn't satisfy this would not be a valid encoding for file names (you
\r
81 couldn't even pass such a file name to the open() system call, because
\r
82 it expects a NULL-terminated byte string).
\r
84 wchar_t is another matter entirely. wchar_t is the type used by C to
\r
85 represent wide strings internally, which generally (but not
\r
86 necessarily!) means it stores a Unicode code point. However, this
\r
87 isn't an encoding, and different compilers can give wchar_t different
\r
88 meanings, so wchar_t strings aren't generally appropriate for storing
\r
89 or sharing between processes or with the kernel.
\r