--- /dev/null
+Return-Path: <bremner@tesseract.cs.unb.ca>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by arlo.cworth.org (Postfix) with ESMTP id BDD866DE014D\r
+ for <notmuch@notmuchmail.org>; Mon, 27 Jun 2016 06:48:42 -0700 (PDT)\r
+X-Virus-Scanned: Debian amavisd-new at cworth.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.005\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.005 tagged_above=-999 required=5\r
+ tests=[AWL=-0.006, HEADER_FROM_DIFFERENT_DOMAINS=0.001]\r
+ autolearn=disabled\r
+Received: from arlo.cworth.org ([127.0.0.1])\r
+ by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id aPDJIOOf4Vrt for <notmuch@notmuchmail.org>;\r
+ Mon, 27 Jun 2016 06:48:34 -0700 (PDT)\r
+Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197])\r
+ by arlo.cworth.org (Postfix) with ESMTPS id F20FC6DE00CC\r
+ for <notmuch@notmuchmail.org>; Mon, 27 Jun 2016 06:48:33 -0700 (PDT)\r
+Received: from remotemail by fethera.tethera.net with local (Exim 4.84)\r
+ (envelope-from <bremner@tesseract.cs.unb.ca>)\r
+ id 1bHWtZ-0001Xa-RZ; Mon, 27 Jun 2016 09:48:13 -0400\r
+Received: (nullmailer pid 17561 invoked by uid 1000);\r
+ Mon, 27 Jun 2016 13:33:20 -0000\r
+From: David Bremner <david@tethera.net>\r
+To: notmuch@notmuchmail.org\r
+Subject: [PATCH] lib: regexp matching in 'subject' and 'from'\r
+Date: Mon, 27 Jun 2016 15:33:07 +0200\r
+Message-Id: <1467034387-16885-1-git-send-email-david@tethera.net>\r
+X-Mailer: git-send-email 2.8.1\r
+MIME-Version: 1.0\r
+Content-Type: text/plain; charset=UTF-8\r
+Content-Transfer-Encoding: 8bit\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.20\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <https://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch/>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <https://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Mon, 27 Jun 2016 13:48:42 -0000\r
+\r
+the idea is that you can run\r
+\r
+% notmuch search re:subject:<your-favourite-regexp>\r
+% notmuch search re:from:<your-favourite-regexp>'\r
+\r
+or\r
+\r
+% notmuch search subject:"your usual phrase search"\r
+% notmuch search from:"usual phrase search"\r
+\r
+This should also work with bindings, since it extends the query parser.\r
+\r
+This is trivial to extend for other value slots, but currently the only\r
+value slots are date, message_id, from, subject, and last_mod. Date is\r
+already searchable, and message_id is not obviously useful to regex\r
+match.\r
+\r
+This was originally written by Austin Clements, and ported to Xapian\r
+field processors (from Austin's custom query parser) by yours truly.\r
+---\r
+\r
+This is the zero-th non-WIP version. Since the last version [1], I\r
+have added some better error reporting for regexp syntax errors, tests\r
+for two kinds of query syntax error, and some documentation for the\r
+query syntax.\r
+\r
+ doc/man7/notmuch-search-terms.rst | 17 +++++-\r
+ lib/Makefile.local | 1 +\r
+ lib/database-private.h | 1 +\r
+ lib/database.cc | 5 ++\r
+ lib/regexp-fields.cc | 125 ++++++++++++++++++++++++++++++++++++++\r
+ lib/regexp-fields.h | 77 +++++++++++++++++++++++\r
+ test/T630-regexp-query.sh | 91 +++++++++++++++++++++++++++\r
+ 7 files changed, 316 insertions(+), 1 deletion(-)\r
+ create mode 100644 lib/regexp-fields.cc\r
+ create mode 100644 lib/regexp-fields.h\r
+ create mode 100755 test/T630-regexp-query.sh\r
+\r
+diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst\r
+index 075f88c..6155406 100644\r
+--- a/doc/man7/notmuch-search-terms.rst\r
++++ b/doc/man7/notmuch-search-terms.rst\r
+@@ -58,6 +58,8 @@ indicate user-supplied values):\r
+ \r
+ - query:<name>\r
+ \r
++- re:{subject,from}:<regex>\r
++\r
+ The **from:** prefix is used to match the name or address of the sender\r
+ of an email message.\r
+ \r
+@@ -139,6 +141,12 @@ queries added with **notmuch-config(1)**. Named queries are only\r
+ available if notmuch is built with **Xapian Field Processors** (see\r
+ below).\r
+ \r
++The **re:<field>:** prefix can be used to restrict the results to\r
++those whose <field> matches the given regular expression (see\r
++**regex(7)**). Regular expression searches are only available if\r
++notmuch is built with **Xapian Field Processors** (see below), and\r
++currently only for the Subject and From fields.\r
++\r
+ Operators\r
+ ---------\r
+ \r
+@@ -213,13 +221,19 @@ Boolean and Probabilistic Prefixes\r
+ ----------------------------------\r
+ \r
+ Xapian (and hence notmuch) prefixes are either **boolean**, supporting\r
+-exact matches like "tag:inbox" or **probabilistic**, supporting a more flexible **term** based searching. The prefixes currently supported by notmuch are as follows.\r
++exact matches like "tag:inbox" or **probabilistic**, supporting a more\r
++flexible **term** based searching. Certain **special** prefixes are\r
++processed by notmuch in a way not stricly fitting either of Xapian's\r
++built in styles. The prefixes currently supported by notmuch are as\r
++follows.\r
+ \r
+ \r
+ Boolean\r
+ **tag:**, **id:**, **thread:**, **folder:**, **path:**\r
+ Probabilistic\r
+ **from:**, **to:**, **subject:**, **attachment:**, **mimetype:**\r
++Special\r
++ **query:**, **re:<field>**\r
+ \r
+ Terms and phrases\r
+ -----------------\r
+@@ -389,6 +403,7 @@ Currently the following features require field processor support:\r
+ \r
+ - non-range date queries, e.g. "date:today"\r
+ - named queries e.g. "query:my_special_query"\r
++- regular expression searches, e.g. "re:subject:^\\[SPAM\\]"\r
+ \r
+ SEE ALSO\r
+ ========\r
+diff --git a/lib/Makefile.local b/lib/Makefile.local\r
+index beb9635..68771e6 100644\r
+--- a/lib/Makefile.local\r
++++ b/lib/Makefile.local\r
+@@ -51,6 +51,7 @@ libnotmuch_cxx_srcs = \\r
+ $(dir)/query.cc \\r
+ $(dir)/query-fp.cc \\r
+ $(dir)/config.cc \\r
++ $(dir)/regexp-fields.cc \\r
+ $(dir)/thread.cc\r
+ \r
+ libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o)\r
+diff --git a/lib/database-private.h b/lib/database-private.h\r
+index ca71a92..900a989 100644\r
+--- a/lib/database-private.h\r
++++ b/lib/database-private.h\r
+@@ -186,6 +186,7 @@ struct _notmuch_database {\r
+ #if HAVE_XAPIAN_FIELD_PROCESSOR\r
+ Xapian::FieldProcessor *date_field_processor;\r
+ Xapian::FieldProcessor *query_field_processor;\r
++ Xapian::FieldProcessor *re_field_processor;\r
+ #endif\r
+ Xapian::ValueRangeProcessor *last_mod_range_processor;\r
+ };\r
+diff --git a/lib/database.cc b/lib/database.cc\r
+index afafe88..b52b62d 100644\r
+--- a/lib/database.cc\r
++++ b/lib/database.cc\r
+@@ -21,6 +21,7 @@\r
+ #include "database-private.h"\r
+ #include "parse-time-vrp.h"\r
+ #include "query-fp.h"\r
++#include "regexp-fields.h"\r
+ #include "string-util.h"\r
+ \r
+ #include <iostream>\r
+@@ -1016,6 +1017,8 @@ notmuch_database_open_verbose (const char *path,\r
+ notmuch->query_parser->add_boolean_prefix("date", notmuch->date_field_processor);\r
+ notmuch->query_field_processor = new QueryFieldProcessor (*notmuch->query_parser, notmuch);\r
+ notmuch->query_parser->add_boolean_prefix("query", notmuch->query_field_processor);\r
++ notmuch->re_field_processor = new RegexpFieldProcessor (*notmuch->query_parser, notmuch);\r
++ notmuch->query_parser->add_boolean_prefix("re", notmuch->re_field_processor);\r
+ #endif\r
+ notmuch->last_mod_range_processor = new Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_LAST_MOD, "lastmod:");\r
+ \r
+@@ -1112,6 +1115,8 @@ notmuch_database_close (notmuch_database_t *notmuch)\r
+ notmuch->date_field_processor = NULL;\r
+ delete notmuch->query_field_processor;\r
+ notmuch->query_field_processor = NULL;\r
++ delete notmuch->re_field_processor;\r
++ notmuch->re_field_processor = NULL;\r
+ #endif\r
+ \r
+ return status;\r
+diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc\r
+new file mode 100644\r
+index 0000000..4d3d972\r
+--- /dev/null\r
++++ b/lib/regexp-fields.cc\r
+@@ -0,0 +1,125 @@\r
++/* regexp-fields.cc - "re:" field processor glue\r
++ *\r
++ * This file is part of notmuch.\r
++ *\r
++ * Copyright © 2015 Austin Clements\r
++ * Copyright © 2016 David Bremner\r
++ *\r
++ * This program is free software: you can redistribute it and/or modify\r
++ * it under the terms of the GNU General Public License as published by\r
++ * the Free Software Foundation, either version 3 of the License, or\r
++ * (at your option) any later version.\r
++ *\r
++ * This program is distributed in the hope that it will be useful,\r
++ * but WITHOUT ANY WARRANTY; without even the implied warranty of\r
++ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\r
++ * GNU General Public License for more details.\r
++ *\r
++ * You should have received a copy of the GNU General Public License\r
++ * along with this program. If not, see https://www.gnu.org/licenses/ .\r
++ *\r
++ * Author: Austin Clements <aclements@csail.mit.edu>\r
++ * David Bremner <david@tethera.net>\r
++ */\r
++\r
++#include "regexp-fields.h"\r
++#include "notmuch-private.h"\r
++\r
++#if HAVE_XAPIAN_FIELD_PROCESSOR\r
++RegexpPostingSource::RegexpPostingSource (Xapian::valueno slot, const std::string ®exp)\r
++ : slot_ (slot)\r
++{\r
++ int err = regcomp (®exp_, regexp.c_str (), REG_EXTENDED | REG_NOSUB);\r
++\r
++ if (err != 0) {\r
++ size_t len = regerror (err, ®exp_, NULL, 0);\r
++ char *buffer = new char[len];\r
++ std::string msg;\r
++ (void) regerror (err, ®exp_, buffer, len);\r
++ msg.assign (buffer, len);\r
++ delete buffer;\r
++\r
++ throw Xapian::QueryParserError (msg);\r
++ }\r
++}\r
++\r
++RegexpPostingSource::~RegexpPostingSource ()\r
++{\r
++ regfree (®exp_);\r
++}\r
++\r
++void\r
++RegexpPostingSource::init (const Xapian::Database &db)\r
++{\r
++ db_ = db;\r
++ it_ = db_.valuestream_begin (slot_);\r
++ end_ = db.valuestream_end (slot_);\r
++ started_ = false;\r
++}\r
++\r
++Xapian::doccount\r
++RegexpPostingSource::get_termfreq_min () const\r
++{\r
++ return 0;\r
++}\r
++\r
++Xapian::doccount\r
++RegexpPostingSource::get_termfreq_est () const\r
++{\r
++ return get_termfreq_max () / 2;\r
++}\r
++\r
++Xapian::doccount\r
++RegexpPostingSource::get_termfreq_max () const\r
++{\r
++ return db_.get_value_freq (slot_);\r
++}\r
++\r
++Xapian::docid\r
++RegexpPostingSource::get_docid () const\r
++{\r
++ return it_.get_docid ();\r
++}\r
++\r
++bool\r
++RegexpPostingSource::at_end () const\r
++{\r
++ return it_ == end_;\r
++}\r
++\r
++void\r
++RegexpPostingSource::next (unused (double min_wt))\r
++{\r
++ if (started_ && ! at_end ())\r
++ ++it_;\r
++ started_ = true;\r
++\r
++ for (; ! at_end (); ++it_) {\r
++ std::string value = *it_;\r
++ if (regexec (®exp_, value.c_str (), 0, NULL, 0) == 0)\r
++ break;\r
++ }\r
++}\r
++\r
++static Xapian::valueno\r
++_find_slot (std::string prefix)\r
++{\r
++ if (prefix == "from")\r
++ return NOTMUCH_VALUE_FROM;\r
++ else if (prefix == "subject")\r
++ return NOTMUCH_VALUE_SUBJECT;\r
++ else\r
++ throw Xapian::QueryParserError ("unsupported regexp field '" + prefix + "'");\r
++}\r
++\r
++Xapian::Query\r
++RegexpFieldProcessor::operator() (const std::string & str)\r
++{\r
++ size_t pos = str.find_first_of (':');\r
++ std::string prefix = str.substr (0, pos);\r
++ std::string regexp = str.substr (pos + 1);\r
++\r
++ postings = new RegexpPostingSource (_find_slot (prefix), regexp);\r
++ return Xapian::Query (postings);\r
++}\r
++#endif\r
+diff --git a/lib/regexp-fields.h b/lib/regexp-fields.h\r
+new file mode 100644\r
+index 0000000..2c9c2d7\r
+--- /dev/null\r
++++ b/lib/regexp-fields.h\r
+@@ -0,0 +1,77 @@\r
++/* regex-fields.h - xapian glue for semi-bruteforce regexp search\r
++ *\r
++ * This file is part of notmuch.\r
++ *\r
++ * Copyright © 2015 Austin Clements\r
++ * Copyright © 2016 David Bremner\r
++ *\r
++ * This program is free software: you can redistribute it and/or modify\r
++ * it under the terms of the GNU General Public License as published by\r
++ * the Free Software Foundation, either version 3 of the License, or\r
++ * (at your option) any later version.\r
++ *\r
++ * This program is distributed in the hope that it will be useful,\r
++ * but WITHOUT ANY WARRANTY; without even the implied warranty of\r
++ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\r
++ * GNU General Public License for more details.\r
++ *\r
++ * You should have received a copy of the GNU General Public License\r
++ * along with this program. If not, see https://www.gnu.org/licenses/ .\r
++ *\r
++ * Author: Austin Clements <aclements@csail.mit.edu>\r
++ * David Bremner <david@tethera.net>\r
++ */\r
++\r
++#ifndef NOTMUCH_REGEXP_FIELDS_H\r
++#define NOTMUCH_REGEXP_FIELDS_H\r
++#if HAVE_XAPIAN_FIELD_PROCESSOR\r
++#include <sys/types.h>\r
++#include <regex.h>\r
++#include <xapian.h>\r
++#include "notmuch-private.h"\r
++\r
++/* A posting source that returns documents where a value matches a\r
++ * regexp.\r
++ */\r
++class RegexpPostingSource : public Xapian::PostingSource\r
++{\r
++ protected:\r
++ const Xapian::valueno slot_;\r
++ regex_t regexp_;\r
++ Xapian::Database db_;\r
++ bool started_;\r
++ Xapian::ValueIterator it_, end_;\r
++\r
++/* No copying */\r
++ RegexpPostingSource (const RegexpPostingSource &);\r
++ RegexpPostingSource &operator= (const RegexpPostingSource &);\r
++\r
++ public:\r
++ RegexpPostingSource (Xapian::valueno slot, const std::string ®exp);\r
++ ~RegexpPostingSource ();\r
++ void init (const Xapian::Database &db);\r
++ Xapian::doccount get_termfreq_min () const;\r
++ Xapian::doccount get_termfreq_est () const;\r
++ Xapian::doccount get_termfreq_max () const;\r
++ Xapian::docid get_docid () const;\r
++ bool at_end () const;\r
++ void next (unused (double min_wt));\r
++};\r
++\r
++\r
++class RegexpFieldProcessor : public Xapian::FieldProcessor {\r
++ protected:\r
++ Xapian::QueryParser &parser;\r
++ notmuch_database_t *notmuch;\r
++ RegexpPostingSource *postings = NULL;\r
++\r
++ public:\r
++ RegexpFieldProcessor (Xapian::QueryParser &parser_, notmuch_database_t *notmuch_)\r
++ : parser(parser_), notmuch(notmuch_) { };\r
++\r
++ ~RegexpFieldProcessor () { delete postings; };\r
++\r
++ Xapian::Query operator()(const std::string & str);\r
++};\r
++#endif\r
++#endif /* NOTMUCH_REGEXP_FIELDS_H */\r
+diff --git a/test/T630-regexp-query.sh b/test/T630-regexp-query.sh\r
+new file mode 100755\r
+index 0000000..3bbe47c\r
+--- /dev/null\r
++++ b/test/T630-regexp-query.sh\r
+@@ -0,0 +1,91 @@\r
++#!/usr/bin/env bash\r
++test_description='regular expression searches'\r
++. ./test-lib.sh || exit 1\r
++\r
++add_email_corpus\r
++\r
++\r
++if [ $NOTMUCH_HAVE_XAPIAN_FIELD_PROCESSOR -eq 1 ]; then\r
++\r
++ notmuch search --output=messages from:cworth > cworth.msg-ids\r
++\r
++ test_begin_subtest "regexp from search, case sensitive"\r
++ notmuch search --output=messages re:from:carl > OUTPUT\r
++ test_expect_equal_file /dev/null OUTPUT\r
++\r
++ test_begin_subtest "empty regexp or query"\r
++ notmuch search --output=messages re:from:carl or from:cworth > OUTPUT\r
++ test_expect_equal_file cworth.msg-ids OUTPUT\r
++\r
++ test_begin_subtest "non-empty regexp and query"\r
++ notmuch search re:from:cworth and subject:patch > OUTPUT\r
++ cat <<EOF > EXPECTED\r
++thread:0000000000000008 2009-11-18 [1/2] Carl Worth| Alex Botero-Lowry; [notmuch] [PATCH] Error out if no query is supplied to search instead of going into an infinite loop (attachment inbox unread)\r
++thread:0000000000000007 2009-11-18 [1/2] Carl Worth| Ingmar Vanhassel; [notmuch] [PATCH] Typsos (inbox unread)\r
++thread:0000000000000018 2009-11-18 [1/2] Carl Worth| Jan Janak; [notmuch] [PATCH] Older versions of install do not support -C. (inbox unread)\r
++thread:0000000000000017 2009-11-18 [1/2] Carl Worth| Keith Packard; [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and unread) tags (inbox unread)\r
++thread:0000000000000014 2009-11-18 [2/5] Carl Worth| Mikhail Gusarov, Keith Packard; [notmuch] [PATCH 1/2] Close message file after parsing message headers (inbox unread)\r
++thread:0000000000000001 2009-11-18 [1/1] Stewart Smith; [notmuch] [PATCH] Fix linking with gcc to use g++ to link in C++ libs. (inbox unread)\r
++EOF\r
++ test_expect_equal_file EXPECTED OUTPUT\r
++\r
++ test_begin_subtest "regexp from search, duplicate term search"\r
++ notmuch search --output=messages re:from:cworth > OUTPUT\r
++ test_expect_equal_file cworth.msg-ids OUTPUT\r
++\r
++ test_begin_subtest "long enough regexp matches only desired senders"\r
++ notmuch search --output=messages 're:"from:C.* Wo"' > OUTPUT\r
++ test_expect_equal_file cworth.msg-ids OUTPUT\r
++\r
++ test_begin_subtest "shorter regexp matches one more sender"\r
++ notmuch search --output=messages 're:"from:C.* W"' > OUTPUT\r
++ (echo id:1258544095-16616-1-git-send-email-chris@chris-wilson.co.uk ; cat cworth.msg-ids) > EXPECTED\r
++ test_expect_equal_file EXPECTED OUTPUT\r
++\r
++ test_begin_subtest "regexp subject search, non-ASCII"\r
++ notmuch search --output=messages re:subject:accentué > OUTPUT\r
++ echo id:877h1wv7mg.fsf@inf-8657.int-evry.fr > EXPECTED\r
++ test_expect_equal_file EXPECTED OUTPUT\r
++\r
++ test_begin_subtest "regexp subject search, punctuation"\r
++ notmuch search re:subject:\'X\' > OUTPUT\r
++ cat <<EOF > EXPECTED\r
++thread:0000000000000017 2009-11-18 [2/2] Keith Packard, Carl Worth; [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and unread) tags (inbox unread)\r
++EOF\r
++ test_expect_equal_file EXPECTED OUTPUT\r
++\r
++ test_begin_subtest "regexp subject search, no punctuation"\r
++ notmuch search re:subject:X > OUTPUT\r
++ cat <<EOF > EXPECTED\r
++thread:0000000000000017 2009-11-18 [2/2] Keith Packard, Carl Worth; [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and unread) tags (inbox unread)\r
++thread:000000000000000f 2009-11-18 [4/4] Jjgod Jiang, Alexander Botero-Lowry; [notmuch] Mac OS X/Darwin compatibility issues (inbox unread)\r
++EOF\r
++ test_expect_equal_file EXPECTED OUTPUT\r
++\r
++ test_begin_subtest "combine regexp from and subject"\r
++ notmuch search re:subject:-C and re:from:.an.k > OUTPUT\r
++ cat <<EOF > EXPECTED\r
++thread:0000000000000018 2009-11-17 [1/2] Jan Janak| Carl Worth; [notmuch] [PATCH] Older versions of install do not support -C. (inbox unread)\r
++EOF\r
++ test_expect_equal_file EXPECTED OUTPUT\r
++\r
++ test_begin_subtest "bad subprefix"\r
++ notmuch search 're:unsupported:.*' 1>OUTPUT 2>&1\r
++ cat <<EOF > EXPECTED\r
++notmuch search: A Xapian exception occurred\r
++A Xapian exception occurred performing query: unsupported regexp field 'unsupported'\r
++Query string was: re:unsupported:.*\r
++EOF\r
++ test_expect_equal_file EXPECTED OUTPUT\r
++\r
++ test_begin_subtest "regexp error reporting"\r
++ notmuch search 're:from:unbalanced[' 1>OUTPUT 2>&1\r
++ cat <<EOF > EXPECTED\r
++notmuch search: A Xapian exception occurred\r
++A Xapian exception occurred performing query: Invalid regular expression\r
++Query string was: re:from:unbalanced[\r
++EOF\r
++ test_expect_equal_file EXPECTED OUTPUT\r
++fi\r
++\r
++test_done\r
+-- \r
+2.8.1\r
+\r