Re: `notmuch-escape-boolean-term': Broken for non-ascii characters
authorMoritz Ulrich <moritz@tarn-vedra.de>
Tue, 12 Aug 2014 21:47:42 +0000 (23:47 +0200)
committerW. Trevor King <wking@tremily.us>
Fri, 7 Nov 2014 18:04:12 +0000 (10:04 -0800)
e6/b5707d91561b8c6a058e1ec0cbdf342c978741 [new file with mode: 0644]

diff --git a/e6/b5707d91561b8c6a058e1ec0cbdf342c978741 b/e6/b5707d91561b8c6a058e1ec0cbdf342c978741
new file mode 100644 (file)
index 0000000..3dc417a
--- /dev/null
@@ -0,0 +1,165 @@
+Return-Path: <moritz@tarn-vedra.de>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+       by olra.theworths.org (Postfix) with ESMTP id 3EBA6431FBC\r
+       for <notmuch@notmuchmail.org>; Tue, 12 Aug 2014 14:48:04 -0700 (PDT)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.7\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5\r
+       tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+       by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+       with ESMTP id oejNArLt4CJB for <notmuch@notmuchmail.org>;\r
+       Tue, 12 Aug 2014 14:47:53 -0700 (PDT)\r
+Received: from mail-we0-f169.google.com (mail-we0-f169.google.com\r
+       [74.125.82.169]) (using TLSv1 with cipher RC4-SHA (128/128 bits))\r
+       (No client certificate requested)\r
+       by olra.theworths.org (Postfix) with ESMTPS id A9C44431FAF\r
+       for <notmuch@notmuchmail.org>; Tue, 12 Aug 2014 14:47:53 -0700 (PDT)\r
+Received: by mail-we0-f169.google.com with SMTP id u56so10625935wes.0\r
+       for <notmuch@notmuchmail.org>; Tue, 12 Aug 2014 14:47:52 -0700 (PDT)\r
+X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\r
+       d=1e100.net; s=20130820;\r
+       h=x-gm-message-state:from:to:cc:subject:in-reply-to:references\r
+       :user-agent:date:message-id:mime-version:content-type;\r
+       bh=aGEFSJVm6ula0Y0B1JNnKKT/y7aCOaHZvA2Fmfu1mJI=;\r
+       b=Y+ujb5GV/0ZaE3J3B+XbtTzEazTK3KmwEEkFNeVQt9P1TTtWt05RW7Dy1KQeWTRyvq\r
+       m/AGuA3hfeTahPZI6kXPcxF+qTrZv+1oyLCL91BwpcRf8sPCMsj0kI2P6zlM7ytBfnSo\r
+       RUwaHXK7IWtcfZAehmr+ipWjgEBY0Uk8DVP4jrAHuCb9QYZoVtM626luTFXw0w+I9yO3\r
+       ylPfuZDSQawCS4sIf4kkjz45JRHLjvYtnMuEtjdqD/kSSFCosl+BQKCCrlrtPG4zoyo9\r
+       h71ihwArxGbjU+8WNGtaXdWK1xNbTHTZvikme1bj8qwN/0Fv8q5vfdmi8R+NNypKW5n0\r
+       Ye2A==\r
+X-Gm-Message-State:\r
+ ALoCoQnD8z+qfz4QVERVjL6bWY7qwQz+T3OsFAkDry1O4u2eyxkcSfBvoNhoqvO+9G9ue6raFIzv\r
+X-Received: by 10.180.72.146 with SMTP id d18mr102161wiv.53.1407880071158;\r
+       Tue, 12 Aug 2014 14:47:51 -0700 (PDT)\r
+Received: from moritz-x230 (p3E9BBDA6.dip0.t-ipconnect.de. [62.155.189.166])\r
+       by mx.google.com with ESMTPSA id w1sm60559662wiz.14.2014.08.12.14.47.49\r
+       for <multiple recipients>\r
+       (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\r
+       Tue, 12 Aug 2014 14:47:49 -0700 (PDT)\r
+From: Moritz Ulrich <moritz@tarn-vedra.de>\r
+To: "Austin T. Clements" <aclements@csail.mit.edu>\r
+Subject: Re: `notmuch-escape-boolean-term': Broken for non-ascii characters\r
+In-Reply-To:\r
+ <20140812103300.Horde.O1lIjfCL-Lh8XGn65RO2Cg1@webmail.csail.mit.edu>\r
+References: <874mxiu5hj.fsf@tarn-vedra.de>\r
+       <20140812103300.Horde.O1lIjfCL-Lh8XGn65RO2Cg1@webmail.csail.mit.edu>\r
+User-Agent: Notmuch/0.18.1 (http://notmuchmail.org) Emacs/24.3.1\r
+       (x86_64-unknown-linux-gnu)\r
+Date: Tue, 12 Aug 2014 23:47:42 +0200\r
+Message-ID: <874mxhbcsh.fsf@tarn-vedra.de>\r
+MIME-Version: 1.0\r
+Content-Type: multipart/signed; boundary="=-=-=";\r
+       micalg=pgp-sha256; protocol="application/pgp-signature"\r
+X-Mailman-Approved-At: Tue, 12 Aug 2014 22:26:53 -0700\r
+Cc: notmuch@notmuchmail.org\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+       <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+       <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+       <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Tue, 12 Aug 2014 21:48:04 -0000\r
+\r
+--=-=-=\r
+Content-Type: text/plain; charset=utf-8\r
+Content-Transfer-Encoding: quoted-printable\r
+\r
+"Austin T. Clements" <aclements@csail.mit.edu> writes:\r
+\r
+> Quoting Moritz Ulrich <moritz@tarn-vedra.de>:\r
+>> Hello,\r
+>>\r
+>> I recently adopted notmuch as my primary way to read mail, so thank you\r
+>> for this great tool!\r
+>>\r
+>> Unfortunately, I ran into a problem of the Emacs side of the project\r
+>> when used in a non-ascii environment:\r
+>>\r
+>> Having a tag named 'uni-k=C3=B6ln', the tag:-completion doesn't work.\r
+>>\r
+>> This is caused by `notmuch-escape-boolean-term' errornously escaping the\r
+>> above string:\r
+>>\r
+>> (notmuch-escape-boolean-term "uni-k=C3=B6ln") =3D> "\"uni-k=C3=B6ln\""\r
+>>\r
+>> This is caused by `string-match' with the following errornously matching\r
+>> my tag:\r
+>>\r
+>> (string-match "[^!#-'*-~]" "uni-k=C3=B6ln") =3D> 5\r
+>> (string-match "[^!#-'*-~]" "uni-koln") =3D> nil\r
+>>\r
+>> I'm not exactly sure how to tackle this - the Regexp was crafted to match\r
+>> (, ), " if I understand it correct. A simple way would be just adding\r
+>> more characters as a sort-of whitelist. A nicer solution would be\r
+>> converting it from [^...] to [...] to explicitly mark letters that needs\r
+>> to be escaped.\r
+>\r
+> notmuch-escape-boolean-term used to use a blacklist, but we switched\r
+> to a whitelist because Xapian's own parser has changed over the years\r
+> in its handling of non-ASCII characters and invalidated our blacklist.\r
+> Ultimately it seemed much safer to go with a whitelist.  Quoting\r
+> "uni-k=C3=B6ln" isn't erroneous, it's just conservative.\r
+>\r
+> Could you explain in more detail what's broken?  I tried adding the\r
+> tag uni-k=C3=B6ln to a message in Emacs, then hitting "s" to start a sear=\r
+ch\r
+> then "tag:<TAB>" and that tag (surrounded by quotes) was one of the\r
+> completion options.  Upon completing to that tag, the search worked\r
+> fine.\r
+>\r
+> Are you objecting to the unnecessary (but legal) quotes in the\r
+> completion?  We might be able to include Unicode word characters in\r
+> the quoting whitelist, though that seems like a spot fix (probably a\r
+> fairly broad one, so maybe that's fine) and might be tricky because of\r
+> Emacs' somewhat weird Unicode regexp support (using [[:alpha:]] might\r
+> Just Work, but we'd have to be careful of the active syntax table).\r
+> Or tab completion could recognize that, say, tag:uni doesn't require\r
+> quoting, but still expand it to tag:"uni-k=C3=B6ln".\r
+\r
+Thanks for explaining the reason for the whitelist-approach. Knowing\r
+this is quite helpful.\r
+\r
+I can't really explain why, but I just didn't notice tag:"uni-k=C3=B6ln" in\r
+the tag-completion - I think my expectations for finding it as\r
+tag:uni-k=C3=B6ln must have blinded me.\r
+\r
+While it isn't errornous, it's higly unintuitive to quote tags like\r
+this. I can understand that a much more permissive whitelist could cause\r
+other problems which are harder to track down, so maybe it's possible to\r
+make the behavior configurable (e.g. by using a `defvar' for the regex).\r
+\r
+=2D-=20\r
+Moritz Ulrich\r
+\r
+--=-=-=\r
+Content-Type: application/pgp-signature\r
+\r
+-----BEGIN PGP SIGNATURE-----\r
+Version: GnuPG v2\r
+\r
+iQIcBAEBCAAGBQJT6ouDAAoJEKnhzHnsv6QyYJkP/Rdf5grt5sz/hxDS6QehollQ\r
+kzAWNlmPulxNWPPTGbfBqUOKSynNJipaMtiout1x8rMEnFpw+lgWGtTy8Zxz4s1U\r
+5xBIp3v3IH98Imm/bLS7P8rDU7ExI6RITI9829nyLVZTMftyN0EmE36qKAwA+nDv\r
+z+71wD7tRODxy2bgvKoZJfyisIfemfb3UthhlS71fzjqlo44hqkZg1GKFRtMpDCm\r
+vNAArH5VqxY5ooQ7Omtgv57PGNQReg7uFwbnC65t40b1QAbUpDF6h639BJjIM36o\r
+zItU0d6OsmBwKb7IhIX2npev/yDq4hDHJAHeYxqK+/WCRNIQUK1kmsUTB++xzFUP\r
+ECP8fr1N1yUR2mo7DniY/FP/T9GvKGVUTiCWg5xiID25LLAfVyIFfWS+M6jusOrR\r
+G54NypRJR9hWuCgoZFz2qbRZu4sP6S2umTe9Efji7Lha4YDZgf9m6MPtXbEKGLPU\r
+YdlIdnPg12RQvMOHLlpfhSK9w1ZGUty+7xxbdKT04NsQ3N4VmSyHC6J079zmaSMz\r
+FnjFLAEiyqkWa0op4FJQHopb/R6rRPw97055ULDpB1Bwa5Bssa3nq74JwfsFIdGt\r
+j00jIaQMp0aABvCXjHUPDXakFzvq2ID2RBrlzybkHgTt9FA29MMlB1NJA/XyKEBz\r
+yxbvd9c8DMHUDWNAksUj\r
+=B6Pb\r
+-----END PGP SIGNATURE-----\r
+--=-=-=--\r