Re: [PATCH] WIP: regexp matching in 'subject' and 'from'
authorGaute Hope <eg@gaute.vetsj.com>
Sat, 11 Jun 2016 16:32:14 +0000 (16:32 +0000)
committerW. Trevor King <wking@tremily.us>
Sat, 20 Aug 2016 23:22:04 +0000 (16:22 -0700)
1c/a85f43199aa83f376cb747f8c80d6f348d5c8f [new file with mode: 0644]

diff --git a/1c/a85f43199aa83f376cb747f8c80d6f348d5c8f b/1c/a85f43199aa83f376cb747f8c80d6f348d5c8f
new file mode 100644 (file)
index 0000000..103bc22
--- /dev/null
@@ -0,0 +1,122 @@
+Return-Path: <eg@gaute.vetsj.com>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by arlo.cworth.org (Postfix) with ESMTP id E6AB86DE02DA\r
+ for <notmuch@notmuchmail.org>; Sat, 11 Jun 2016 09:32:27 -0700 (PDT)\r
+X-Virus-Scanned: Debian amavisd-new at cworth.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.534\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.534 tagged_above=-999 required=5 tests=[AWL=0.186,\r
+  DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7,\r
+ RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled\r
+Received: from arlo.cworth.org ([127.0.0.1])\r
+ by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id 5htUAc3ln1DR for <notmuch@notmuchmail.org>;\r
+ Sat, 11 Jun 2016 09:32:19 -0700 (PDT)\r
+Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com\r
+ [74.125.82.41]) by arlo.cworth.org (Postfix) with ESMTPS id E83A76DE01BE for\r
+ <notmuch@notmuchmail.org>; Sat, 11 Jun 2016 09:32:18 -0700 (PDT)\r
+Received: by mail-wm0-f41.google.com with SMTP id v199so27145257wmv.0\r
+ for <notmuch@notmuchmail.org>; Sat, 11 Jun 2016 09:32:18 -0700 (PDT)\r
+DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\r
+ d=gaute-vetsj-com.20150623.gappssmtp.com; s=20150623;\r
+ h=date:from:subject:to:cc:references:in-reply-to:user-agent\r
+ :message-id:mime-version:content-transfer-encoding;\r
+ bh=cF9F4QF68lBd4ZlYSGL6Rg8R/az90WWQWTLJ4cN4Lag=;\r
+ b=tglDyXQtAKPsu4ez3d+tcn6LHngTvMCqFAQi6SsSCHdfOh2m7zndYHpF3PqjYcXp+v\r
+ RNGTGTAWFeomoZ5zKZYykurUWmALDQlemv99+4OcfzoMm5nF57aP7JEEkVuPuJloHJco\r
+ XAX62UxPPB7VR0N6MCC7FAAZrH0Ogb5i9jUBtYCH5srT0QzAa8dAhrFR4QaunU2ugkq1\r
+ Wy5TtFextQaT93EyiFfhFMy00vUT9i16vyJbJuXkqUy5/9GjBHSb1TBSByv2LZn6oKyR\r
+ +qFzWnG1JQJH9iDvGD0wankwTb9AxI3ZEkqwbDHIwEjx2EVlQWMFKFWvGXhnJsCQBadL\r
+ RLwA==\r
+X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\r
+ d=1e100.net; s=20130820;\r
+ h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to\r
+ :user-agent:message-id:mime-version:content-transfer-encoding;\r
+ bh=cF9F4QF68lBd4ZlYSGL6Rg8R/az90WWQWTLJ4cN4Lag=;\r
+ b=VfPWWYdEcru/XOcrLbVHBR5j/jLFl4kaoNl6m/WQNwyJpY2DB4VpxlB+93Ukkimq9/\r
+ AA+zsjD+JOHPYPwTfZj7Ih8UBDqrPvVmqEBGlDbz9sl9LXB9KburKTX0nm3bU2gJNjJ7\r
+ 8sbaBkamw8BH3DQsBAD33d7IGVFCnhKE8e5PwOgIj3oMr8wnxaTzmdGCvqZMmKbaZUfk\r
+ oJBJMe/F30eANUdLYlVK3aInGLQx4EEH/mmM7yIuu2s4D10KyOU83pxF9jrvUBQH3eFh\r
+ 9nsVjOjFVJaoaOQo9Hx6HK2415GbcR4d6mnC8I/30JvbjJvteD24JrQO4FAA10YD2gw1\r
+ XZVg==\r
+X-Gm-Message-State:\r
+ ALyK8tLRW5wR17fJSdFNgwlQ7MNRTHRhhFhsbMRW5dV1LseIAURRjCfXRQpKl1nkjumreQ==\r
+X-Received: by 10.28.154.144 with SMTP id c138mr2525025wme.63.1465662737016;\r
+ Sat, 11 Jun 2016 09:32:17 -0700 (PDT)\r
+Received: from localhost (241.89-20-241.enivest.net. [89.20.241.241])\r
+ by smtp.gmail.com with ESMTPSA id q71sm4879619wme.17.2016.06.11.09.32.15\r
+ (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\r
+ Sat, 11 Jun 2016 09:32:15 -0700 (PDT)\r
+Date: Sat, 11 Jun 2016 16:32:14 +0000\r
+From: Gaute Hope <eg@gaute.vetsj.com>\r
+Subject: Re: [PATCH] WIP: regexp matching in 'subject' and 'from'\r
+To: David Bremner <david@tethera.net>, Austin Clements\r
+ <aclements@csail.mit.edu>\r
+Cc: sfischme@uwaterloo.ca, notmuch <notmuch@notmuchmail.org>\r
+References: <1465265149-7174-1-git-send-email-david@tethera.net>\r
+ <1465525688-30913-1-git-send-email-david@tethera.net>\r
+ <1465547660-astroid-0-nudmv20lbk-1296@strange>\r
+ <87a8itxpu7.fsf@zancas.localnet>\r
+In-Reply-To: <87a8itxpu7.fsf@zancas.localnet>\r
+User-Agent: astroid/v0.5-221-g4c2c7173 (https://github.com/gauteh/astroid)\r
+Message-Id: <1465662533-astroid-3-6vuqm3zu54-1296@strange>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain; charset=utf-8; format=flowed\r
+Content-Transfer-Encoding: quoted-printable\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.20\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <https://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch/>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <https://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Sat, 11 Jun 2016 16:32:28 -0000\r
+\r
+David Bremner writes on juni 10, 2016 13:09:\r
+> Gaute Hope <eg@gaute.vetsj.com> writes:\r
+>=20\r
+>>\r
+>> Cool!\r
+>>\r
+>> Would it break a lot of things if you just replace the original prefix?\r
+>=20\r
+> It would change the matching behaviour. I guess there are people that\r
+> like the current "sloppy" matching of from: and subject:.  In my\r
+> not-very-scientific tests, it is a factor of 5 to 10 times slower to do\r
+> regexp search, which makes sense because it is effectively post\r
+> processing the results from Xapian. At least on my system it seems fast\r
+> enough to be usable interactively, but that is a pretty shocking\r
+> performance regression. And I know there are people with more mail on\r
+> slower systems.\r
+\r
+Maybe we could check if the search string contains a regexp and decide\r
+whether to pre-process it on the background of that? I think that would\r
+make the interface more user-friendly. You'd just always use search\r
+whether you decide that you need to put in some regexp or not.\r
+\r
+>=20\r
+>> Could it be made to work on the message body?\r
+>=20\r
+> See Austin's previous reply for the details, but basically no; these\r
+> "values" index in terms of whole strings, while the body is indexed by\r
+> terms (roughly, words). In principle we could add a value slot for the\r
+> body, but I think that would at least double the size of the database\r
+> (maybe more).\r
+>=20\r
+\r
+I would rather have double the db and be able wildcard beginning of\r
+terms. If it is not too much maintaining overhead it might be made\r
+optional?\r
+\r
+\r
+Regards, Gaute\r
+\r
+=\r