From: Austin T. Clements Date: Tue, 12 Aug 2014 14:33:00 +0000 (+2000) Subject: Re: `notmuch-escape-boolean-term': Broken for non-ascii characters X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=7fdaef38f52b0cd8b8a347b519e40d9150e358ef;p=notmuch-archives.git Re: `notmuch-escape-boolean-term': Broken for non-ascii characters --- diff --git a/e8/a21f1fbd5f78c4e4f22796671b2b6927e28931 b/e8/a21f1fbd5f78c4e4f22796671b2b6927e28931 new file mode 100644 index 000000000..89fd925da --- /dev/null +++ b/e8/a21f1fbd5f78c4e4f22796671b2b6927e28931 @@ -0,0 +1,106 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 17787431FBC + for ; Tue, 12 Aug 2014 07:33:11 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -2.3 +X-Spam-Level: +X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id dJLrDEuSwQPP for ; + Tue, 12 Aug 2014 07:33:03 -0700 (PDT) +Received: from outgoing.csail.mit.edu (outgoing.csail.mit.edu [128.30.2.149]) + by olra.theworths.org (Postfix) with ESMTP id B7346431FAF + for ; Tue, 12 Aug 2014 07:33:03 -0700 (PDT) +Received: from webmail.csail.mit.edu ([128.30.2.164] helo=webmail) + by outgoing.csail.mit.edu with esmtpsa + (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) + (envelope-from ) + id 1XHD8G-0007qi-Rh; Tue, 12 Aug 2014 10:33:00 -0400 +Received: from 216-15-114-40.c3-0.arl-ubr1.sbo-arl.ma.cable.rcn.com + (216-15-114-40.c3-0.arl-ubr1.sbo-arl.ma.cable.rcn.com [216.15.114.40]) + by webmail.csail.mit.edu (Horde Framework) with HTTP; + Tue, 12 Aug 2014 10:33:00 -0400 +Date: Tue, 12 Aug 2014 10:33:00 -0400 +Message-ID: + <20140812103300.Horde.O1lIjfCL-Lh8XGn65RO2Cg1@webmail.csail.mit.edu> +From: "Austin T. Clements" +To: Moritz Ulrich +Subject: Re: `notmuch-escape-boolean-term': Broken for non-ascii characters +References: <874mxiu5hj.fsf@tarn-vedra.de> +In-Reply-To: <874mxiu5hj.fsf@tarn-vedra.de> +User-Agent: Internet Messaging Program (IMP) H5 (6.1.4) +Content-Type: text/plain; charset=UTF-8; format=flowed; DelSp=Yes +MIME-Version: 1.0 +Content-Disposition: inline +Content-Transfer-Encoding: 8bit +Cc: notmuch@notmuchmail.org +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Tue, 12 Aug 2014 14:33:11 -0000 + +Quoting Moritz Ulrich : +> Hello, +> +> I recently adopted notmuch as my primary way to read mail, so thank you +> for this great tool! +> +> Unfortunately, I ran into a problem of the Emacs side of the project +> when used in a non-ascii environment: +> +> Having a tag named 'uni-köln', the tag:-completion doesn't work. +> +> This is caused by `notmuch-escape-boolean-term' errornously escaping the +> above string: +> +> (notmuch-escape-boolean-term "uni-köln") => "\"uni-köln\"" +> +> This is caused by `string-match' with the following errornously matching +> my tag: +> +> (string-match "[^!#-'*-~]" "uni-köln") => 5 +> (string-match "[^!#-'*-~]" "uni-koln") => nil +> +> I'm not exactly sure how to tackle this - the Regexp was crafted to match +> (, ), " if I understand it correct. A simple way would be just adding +> more characters as a sort-of whitelist. A nicer solution would be +> converting it from [^...] to [...] to explicitly mark letters that needs +> to be escaped. + +notmuch-escape-boolean-term used to use a blacklist, but we switched +to a whitelist because Xapian's own parser has changed over the years +in its handling of non-ASCII characters and invalidated our blacklist. +Ultimately it seemed much safer to go with a whitelist. Quoting +"uni-köln" isn't erroneous, it's just conservative. + +Could you explain in more detail what's broken? I tried adding the +tag uni-köln to a message in Emacs, then hitting "s" to start a search +then "tag:" and that tag (surrounded by quotes) was one of the +completion options. Upon completing to that tag, the search worked +fine. + +Are you objecting to the unnecessary (but legal) quotes in the +completion? We might be able to include Unicode word characters in +the quoting whitelist, though that seems like a spot fix (probably a +fairly broad one, so maybe that's fine) and might be tricky because of +Emacs' somewhat weird Unicode regexp support (using [[:alpha:]] might +Just Work, but we'd have to be careful of the active syntax table). +Or tab completion could recognize that, say, tag:uni doesn't require +quoting, but still expand it to tag:"uni-köln". + +