Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 3E117431FD0 for ; Thu, 15 Sep 2011 10:29:41 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.799 X-Spam-Level: X-Spam-Status: No, score=-0.799 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p1xWhkqsSM-F for ; Thu, 15 Sep 2011 10:29:40 -0700 (PDT) Received: from mail-qw0-f46.google.com (mail-qw0-f46.google.com [209.85.216.46]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id D6624431FB6 for ; Thu, 15 Sep 2011 10:29:40 -0700 (PDT) Received: by qwj8 with SMTP id 8so1114713qwj.5 for ; Thu, 15 Sep 2011 10:29:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; bh=wqnyt7s7pToiZ1Fkn23fLxUYm0+b6VXoE0xF3BeNHb8=; b=A2qbKasEWgDRfUJY6UaARI4gD6ievoiMKq1QOzc141aXfQyfcR/kRQO8psv+EvwD5V WoNsapPp+Ioom7zEIs70BgyDxWU9dDiNqInXG/0fTfuQntz9AHJUJWwlfOfD5pT0Am+Z ihzUQ0oNrp54xM/Slls8ZF/uwkv/JTEKZ5kMM= Received: by 10.229.86.135 with SMTP id s7mr1157401qcl.257.1316107779138; Thu, 15 Sep 2011 10:29:39 -0700 (PDT) Received: from [192.168.1.190] (pool-68-163-190-45.bos.east.verizon.net. [68.163.190.45]) by mx.google.com with ESMTPS id do8sm4274515qab.17.2011.09.15.10.29.36 (version=SSLv3 cipher=OTHER); Thu, 15 Sep 2011 10:29:37 -0700 (PDT) Subject: Re: Unicode Paths From: Martin Owens To: Kan-Ru Chen In-Reply-To: <8739fzwxfv.fsf@isil.kanru.info> References: <1315972539.2201.11.camel@delen> <8739fzwxfv.fsf@isil.kanru.info> Content-Type: text/plain; charset="UTF-8" Date: Thu, 15 Sep 2011 12:52:30 -0400 Message-ID: <1316105550.2201.21.camel@delen> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Content-Transfer-Encoding: 7bit Cc: Notmuch developer list X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Sep 2011 17:29:41 -0000 It looks like the python variables do include null, my investigations show that the problem also effects tag names. The symptoms can be seen when trying to use the python interface and using unicode tag names or paths. Instead of seeing 'mytag1' we see 'm' and instead of '/my/path/to/mail' we see '/' thus causing issues were the db amusingly was trying to write to root. I'll see if there is a way to remove the nulls from the strings in the python bindings. Martin, On Wed, 2011-09-14 at 12:38 +0800, Kan-Ru Chen wrote: > I think as long as the path does not contain embedded null character > then it is safe. Most posix filesystem does not allow null character > in > the filename so you cannot use UTF-16 or UTF-32 to encode the unicode > path.