From: Moritz Ulrich Date: Tue, 12 Aug 2014 21:47:42 +0000 (+0200) Subject: Re: `notmuch-escape-boolean-term': Broken for non-ascii characters X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=c8b877c3199e10e335bb8b270e683dc4f3f991e1;p=notmuch-archives.git Re: `notmuch-escape-boolean-term': Broken for non-ascii characters --- diff --git a/e6/b5707d91561b8c6a058e1ec0cbdf342c978741 b/e6/b5707d91561b8c6a058e1ec0cbdf342c978741 new file mode 100644 index 000000000..3dc417a95 --- /dev/null +++ b/e6/b5707d91561b8c6a058e1ec0cbdf342c978741 @@ -0,0 +1,165 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 3EBA6431FBC + for ; Tue, 12 Aug 2014 14:48:04 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -0.7 +X-Spam-Level: +X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id oejNArLt4CJB for ; + Tue, 12 Aug 2014 14:47:53 -0700 (PDT) +Received: from mail-we0-f169.google.com (mail-we0-f169.google.com + [74.125.82.169]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id A9C44431FAF + for ; Tue, 12 Aug 2014 14:47:53 -0700 (PDT) +Received: by mail-we0-f169.google.com with SMTP id u56so10625935wes.0 + for ; Tue, 12 Aug 2014 14:47:52 -0700 (PDT) +X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=1e100.net; s=20130820; + h=x-gm-message-state:from:to:cc:subject:in-reply-to:references + :user-agent:date:message-id:mime-version:content-type; + bh=aGEFSJVm6ula0Y0B1JNnKKT/y7aCOaHZvA2Fmfu1mJI=; + b=Y+ujb5GV/0ZaE3J3B+XbtTzEazTK3KmwEEkFNeVQt9P1TTtWt05RW7Dy1KQeWTRyvq + m/AGuA3hfeTahPZI6kXPcxF+qTrZv+1oyLCL91BwpcRf8sPCMsj0kI2P6zlM7ytBfnSo + RUwaHXK7IWtcfZAehmr+ipWjgEBY0Uk8DVP4jrAHuCb9QYZoVtM626luTFXw0w+I9yO3 + ylPfuZDSQawCS4sIf4kkjz45JRHLjvYtnMuEtjdqD/kSSFCosl+BQKCCrlrtPG4zoyo9 + h71ihwArxGbjU+8WNGtaXdWK1xNbTHTZvikme1bj8qwN/0Fv8q5vfdmi8R+NNypKW5n0 + Ye2A== +X-Gm-Message-State: + ALoCoQnD8z+qfz4QVERVjL6bWY7qwQz+T3OsFAkDry1O4u2eyxkcSfBvoNhoqvO+9G9ue6raFIzv +X-Received: by 10.180.72.146 with SMTP id d18mr102161wiv.53.1407880071158; + Tue, 12 Aug 2014 14:47:51 -0700 (PDT) +Received: from moritz-x230 (p3E9BBDA6.dip0.t-ipconnect.de. [62.155.189.166]) + by mx.google.com with ESMTPSA id w1sm60559662wiz.14.2014.08.12.14.47.49 + for + (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); + Tue, 12 Aug 2014 14:47:49 -0700 (PDT) +From: Moritz Ulrich +To: "Austin T. Clements" +Subject: Re: `notmuch-escape-boolean-term': Broken for non-ascii characters +In-Reply-To: + <20140812103300.Horde.O1lIjfCL-Lh8XGn65RO2Cg1@webmail.csail.mit.edu> +References: <874mxiu5hj.fsf@tarn-vedra.de> + <20140812103300.Horde.O1lIjfCL-Lh8XGn65RO2Cg1@webmail.csail.mit.edu> +User-Agent: Notmuch/0.18.1 (http://notmuchmail.org) Emacs/24.3.1 + (x86_64-unknown-linux-gnu) +Date: Tue, 12 Aug 2014 23:47:42 +0200 +Message-ID: <874mxhbcsh.fsf@tarn-vedra.de> +MIME-Version: 1.0 +Content-Type: multipart/signed; boundary="=-=-="; + micalg=pgp-sha256; protocol="application/pgp-signature" +X-Mailman-Approved-At: Tue, 12 Aug 2014 22:26:53 -0700 +Cc: notmuch@notmuchmail.org +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Tue, 12 Aug 2014 21:48:04 -0000 + +--=-=-= +Content-Type: text/plain; charset=utf-8 +Content-Transfer-Encoding: quoted-printable + +"Austin T. Clements" writes: + +> Quoting Moritz Ulrich : +>> Hello, +>> +>> I recently adopted notmuch as my primary way to read mail, so thank you +>> for this great tool! +>> +>> Unfortunately, I ran into a problem of the Emacs side of the project +>> when used in a non-ascii environment: +>> +>> Having a tag named 'uni-k=C3=B6ln', the tag:-completion doesn't work. +>> +>> This is caused by `notmuch-escape-boolean-term' errornously escaping the +>> above string: +>> +>> (notmuch-escape-boolean-term "uni-k=C3=B6ln") =3D> "\"uni-k=C3=B6ln\"" +>> +>> This is caused by `string-match' with the following errornously matching +>> my tag: +>> +>> (string-match "[^!#-'*-~]" "uni-k=C3=B6ln") =3D> 5 +>> (string-match "[^!#-'*-~]" "uni-koln") =3D> nil +>> +>> I'm not exactly sure how to tackle this - the Regexp was crafted to match +>> (, ), " if I understand it correct. A simple way would be just adding +>> more characters as a sort-of whitelist. A nicer solution would be +>> converting it from [^...] to [...] to explicitly mark letters that needs +>> to be escaped. +> +> notmuch-escape-boolean-term used to use a blacklist, but we switched +> to a whitelist because Xapian's own parser has changed over the years +> in its handling of non-ASCII characters and invalidated our blacklist. +> Ultimately it seemed much safer to go with a whitelist. Quoting +> "uni-k=C3=B6ln" isn't erroneous, it's just conservative. +> +> Could you explain in more detail what's broken? I tried adding the +> tag uni-k=C3=B6ln to a message in Emacs, then hitting "s" to start a sear= +ch +> then "tag:" and that tag (surrounded by quotes) was one of the +> completion options. Upon completing to that tag, the search worked +> fine. +> +> Are you objecting to the unnecessary (but legal) quotes in the +> completion? We might be able to include Unicode word characters in +> the quoting whitelist, though that seems like a spot fix (probably a +> fairly broad one, so maybe that's fine) and might be tricky because of +> Emacs' somewhat weird Unicode regexp support (using [[:alpha:]] might +> Just Work, but we'd have to be careful of the active syntax table). +> Or tab completion could recognize that, say, tag:uni doesn't require +> quoting, but still expand it to tag:"uni-k=C3=B6ln". + +Thanks for explaining the reason for the whitelist-approach. Knowing +this is quite helpful. + +I can't really explain why, but I just didn't notice tag:"uni-k=C3=B6ln" in +the tag-completion - I think my expectations for finding it as +tag:uni-k=C3=B6ln must have blinded me. + +While it isn't errornous, it's higly unintuitive to quote tags like +this. I can understand that a much more permissive whitelist could cause +other problems which are harder to track down, so maybe it's possible to +make the behavior configurable (e.g. by using a `defvar' for the regex). + +=2D-=20 +Moritz Ulrich + +--=-=-= +Content-Type: application/pgp-signature + +-----BEGIN PGP SIGNATURE----- +Version: GnuPG v2 + +iQIcBAEBCAAGBQJT6ouDAAoJEKnhzHnsv6QyYJkP/Rdf5grt5sz/hxDS6QehollQ +kzAWNlmPulxNWPPTGbfBqUOKSynNJipaMtiout1x8rMEnFpw+lgWGtTy8Zxz4s1U +5xBIp3v3IH98Imm/bLS7P8rDU7ExI6RITI9829nyLVZTMftyN0EmE36qKAwA+nDv +z+71wD7tRODxy2bgvKoZJfyisIfemfb3UthhlS71fzjqlo44hqkZg1GKFRtMpDCm +vNAArH5VqxY5ooQ7Omtgv57PGNQReg7uFwbnC65t40b1QAbUpDF6h639BJjIM36o +zItU0d6OsmBwKb7IhIX2npev/yDq4hDHJAHeYxqK+/WCRNIQUK1kmsUTB++xzFUP +ECP8fr1N1yUR2mo7DniY/FP/T9GvKGVUTiCWg5xiID25LLAfVyIFfWS+M6jusOrR +G54NypRJR9hWuCgoZFz2qbRZu4sP6S2umTe9Efji7Lha4YDZgf9m6MPtXbEKGLPU +YdlIdnPg12RQvMOHLlpfhSK9w1ZGUty+7xxbdKT04NsQ3N4VmSyHC6J079zmaSMz +FnjFLAEiyqkWa0op4FJQHopb/R6rRPw97055ULDpB1Bwa5Bssa3nq74JwfsFIdGt +j00jIaQMp0aABvCXjHUPDXakFzvq2ID2RBrlzybkHgTt9FA29MMlB1NJA/XyKEBz +yxbvd9c8DMHUDWNAksUj +=B6Pb +-----END PGP SIGNATURE----- +--=-=-=--