--- /dev/null
+Return-Path: <jani@nikula.org>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by olra.theworths.org (Postfix) with ESMTP id 5E878431FC2\r
+ for <notmuch@notmuchmail.org>; Wed, 1 Jan 2014 04:05:14 -0800 (PST)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.7\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5\r
+ tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+ by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id Ig-jlEt3qmPX for <notmuch@notmuchmail.org>;\r
+ Wed, 1 Jan 2014 04:05:04 -0800 (PST)\r
+Received: from mail-ea0-f176.google.com (mail-ea0-f176.google.com\r
+ [209.85.215.176]) (using TLSv1 with cipher RC4-SHA (128/128 bits))\r
+ (No client certificate requested)\r
+ by olra.theworths.org (Postfix) with ESMTPS id 88C6F431FC0\r
+ for <notmuch@notmuchmail.org>; Wed, 1 Jan 2014 04:05:04 -0800 (PST)\r
+Received: by mail-ea0-f176.google.com with SMTP id h14so5812179eaj.35\r
+ for <notmuch@notmuchmail.org>; Wed, 01 Jan 2014 04:05:03 -0800 (PST)\r
+X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\r
+ d=1e100.net; s=20130820;\r
+ h=x-gm-message-state:from:to:cc:subject:in-reply-to:references\r
+ :user-agent:date:message-id:mime-version:content-type\r
+ :content-transfer-encoding;\r
+ bh=yfCvxMG62xd9D+0aOngvooYflgtc9V7expqjW04XtAM=;\r
+ b=lJwLjMZlEUgVIMmUmYca9nz2y/hu8IlcHL0md+cawt99xQdUSFnpgaUzjuSmcWS9Th\r
+ cNajHWdwuD2bPqg0IWn/9mpA5LfgWNRiYCQSMjPH5/UUY8K7WUsAoMNG1cBuxVwRcSwF\r
+ k7nNgCx4+v8D0VtTp9Ep5n3KVLPdRCPj9td+9arqtYuce3AXhAIft3g6qpScm+Ea0bCV\r
+ jTBSzqhU8ctFDPHS7YpKi1xmQ9KU63ze+1QORgM3HyuTEU761nKUj9Pn4pyQ1mtxsbs2\r
+ hQ+4+lNJKIVghmbnbVqF4rJowBSrJOEnxXi7P2rW/oh1nne+MmNxGI8uYsYTMyKbEaO7\r
+ vfUg==\r
+X-Gm-Message-State:\r
+ ALoCoQnS9e9hvjCtjKKOGiLxrL1Pc9fdvaOc/0pVvg2haWgkI0QC8bFDGdw8M6s+bLbkPmKUJ0Lq\r
+X-Received: by 10.14.69.200 with SMTP id n48mr10888063eed.54.1388577901865;\r
+ Wed, 01 Jan 2014 04:05:01 -0800 (PST)\r
+Received: from localhost (dsl-hkibrasgw2-58c36f-91.dhcp.inet.fi.\r
+ [88.195.111.91])\r
+ by mx.google.com with ESMTPSA id m1sm126542611eeg.0.2014.01.01.04.04.59\r
+ for <multiple recipients>\r
+ (version=TLSv1.2 cipher=RC4-SHA bits=128/128);\r
+ Wed, 01 Jan 2014 04:05:00 -0800 (PST)\r
+From: Jani Nikula <jani@nikula.org>\r
+To: "Kirill A. Shutemov" <kirill@shutemov.name>\r
+Subject: Re: [PATCH] lib: Add a new prefix "list" to the search-terms syntax\r
+In-Reply-To: <20131217180322.GA9272@node.dhcp.inet.fi>\r
+References: <20130409083010.GA27675@raorn.name>\r
+ <1365549369-12776-1-git-send-email-raorn@raorn.name>\r
+ <87bo2ougmb.fsf@nikula.org>\r
+ <20131217180322.GA9272@node.dhcp.inet.fi>\r
+User-Agent: Notmuch/0.17~rc2+18~g39a67a6 (http://notmuchmail.org) Emacs/24.3.1\r
+ (x86_64-pc-linux-gnu)\r
+Date: Wed, 01 Jan 2014 14:04:58 +0200\r
+Message-ID: <87y52z29hx.fsf@nikula.org>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain; charset=utf-8\r
+Content-Transfer-Encoding: quoted-printable\r
+Cc: notmuch@notmuchmail.org, "Alexey I. Froloff" <raorn@raorn.name>\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Wed, 01 Jan 2014 12:05:14 -0000\r
+\r
+On Tue, 17 Dec 2013, "Kirill A. Shutemov" <kirill@shutemov.name> wrote:\r
+> On Thu, Oct 17, 2013 at 05:17:00PM +0300, Jani Nikula wrote:\r
+>> On Wed, 10 Apr 2013, "Alexey I. Froloff" <raorn@raorn.name> wrote:\r
+>> > From: "Alexey I. Froloff" <raorn@raorn.name>\r
+>> >\r
+>> > Add support for indexing and searching the message's List-Id header.\r
+>> > This is useful when matching all the messages belonging to a particular\r
+>> > mailing list.\r
+>>=20\r
+>> There's an issue with our duplicate message-id handling that is likely\r
+>> to cause confusion with List-Id: searches. If you receive several\r
+>> duplicates of the same message (judged by the message-id), only the\r
+>> first one of them gets indexed, and the rest are ignored. This means\r
+>> that for messages you receive both directly and through a list, it will\r
+>> be arbitrary whether the List-Id: gets indexed or not. Therefore a list:\r
+>> search might not return all the messages you'd expect.\r
+>\r
+> I've tried to address this. The patch also adds few tests for the feature.\r
+>\r
+> There's still missing functionality: re-indexing existing messages for\r
+> list-id, handling message removal, etc.\r
+>\r
+> Any comments?\r
+\r
+Hi Kirill, sorry it took me so long to get to this!\r
+\r
+I've looked into our duplicate message-id handling and indexing before,\r
+and it's not very good.\r
+\r
+First, we should pay more attention to checking whether the messages\r
+really are duplicates or not. This is not trivial, but we should go a\r
+bit further than just comparing the message-ids. Sadly, handling the\r
+case of colliding message-ids on clearly different messages is not\r
+trivial either, as we rely on the message-ids being unique all around.\r
+\r
+Second, we should be more clever about indexing duplicates that we think\r
+are the same message. This is orthogonal to the first point. Currently,\r
+only the first duplicate gets indexed, and will remain indexed even if\r
+it's deleted and other copies remain. A message that matches a search\r
+might end up not having the matching search terms, for example. A\r
+rebuild of the database might index a different duplicate from the last\r
+time.\r
+\r
+Having said that (partially just to write the thoughts down somewhere!),\r
+I think your basic approach of indexing the list-id for duplicates is\r
+sane, and we can grow more smarts to _notmuch_message_index_file() for\r
+duplicate =3D=3D true in the future, checking more headers etc. One thing I\r
+wonder about though: what if more than one duplicate has list-id, and\r
+_index_list_id() gets called multiple times on a message? (CC Austin, he\r
+probably has more clues on this than me.)\r
+\r
+For merging, you should also address the previous comments to the\r
+original patch. There's been plenty of dropping the ball here it\r
+seems... I think we've also agreed (perhaps only on IRC, I forget) that\r
+we should use "listid" as the prefix, not "list" (sadly hyphens are not\r
+allowed). Splitting the patch to code, test, and man parts might be a\r
+good idea too.\r
+\r
+BR,\r
+Jani.\r
+\r
+\r
+>\r
+> diff --git a/lib/database.cc b/lib/database.cc\r
+> index f395061e3a73..196243e15d1a 100644\r
+> --- a/lib/database.cc\r
+> +++ b/lib/database.cc\r
+> @@ -205,6 +205,7 @@ static prefix_t BOOLEAN_PREFIX_INTERNAL[] =3D {\r
+> };\r
+>=20=20\r
+> static prefix_t BOOLEAN_PREFIX_EXTERNAL[] =3D {\r
+> + { "list", "XLIST"},\r
+> { "thread", "G" },\r
+> { "tag", "K" },\r
+> { "is", "K" },\r
+> @@ -2025,10 +2026,13 @@ notmuch_database_add_message (notmuch_database_t =\r
+*notmuch,\r
+> date =3D notmuch_message_file_get_header (message_file, "date");\r
+> _notmuch_message_set_header_values (message, date, from, subject);\r
+>=20=20\r
+> - ret =3D _notmuch_message_index_file (message, filename);\r
+> + ret =3D _notmuch_message_index_file (message, filename, false);\r
+> if (ret)\r
+> goto DONE;\r
+> } else {\r
+> + ret =3D _notmuch_message_index_file (message, filename, true);\r
+> + if (ret)\r
+> + goto DONE;\r
+> ret =3D NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;\r
+> }\r
+>=20=20\r
+> diff --git a/lib/index.cc b/lib/index.cc\r
+> index 78c18cf36d10..9fe1ad6502ed 100644\r
+> --- a/lib/index.cc\r
+> +++ b/lib/index.cc\r
+> @@ -304,6 +304,47 @@ _index_address_list (notmuch_message_t *message,\r
+> }\r
+> }\r
+>=20=20\r
+> +static void\r
+> +_index_list_id (notmuch_message_t *message,\r
+> + const char *list_id_header)\r
+> +{\r
+> + const char *begin_list_id, *end_list_id, *list_id;\r
+> + void *local;\r
+> +\r
+> + if (list_id_header =3D=3D NULL)\r
+> + return;\r
+> +\r
+> + /* RFC2919 says that the list-id is found at the end of the header\r
+> + * and enclosed between angle brackets. If we cannot find a\r
+> + * matching pair of brackets containing at least one character,\r
+> + * we ignore the list id header. */\r
+> + begin_list_id =3D strrchr (list_id_header, '<');\r
+> + if (!begin_list_id) {\r
+> + fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");\r
+> + return;\r
+> + }\r
+> +\r
+> + end_list_id =3D strrchr(begin_list_id, '>');\r
+> + if (!end_list_id || (end_list_id - begin_list_id < 2)) {\r
+> + fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");\r
+> + return;\r
+> + }\r
+> +\r
+> + local =3D talloc_new (message);\r
+> +\r
+> + /* We extract the list id between the angle brackets */\r
+> + list_id =3D talloc_strndup (local, begin_list_id + 1,\r
+> + end_list_id - begin_list_id - 1);\r
+> +\r
+> + /* _notmuch_message_add_term() may return\r
+> + * NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG here. We can't fix it, but\r
+> + * this is not a reason to exit with error... */\r
+> + if (_notmuch_message_add_term (message, "list", list_id))\r
+> + fprintf (stderr, "Warning: Not indexing List-Id: <%s>\n", list_id);\r
+> +\r
+> + talloc_free (local);\r
+> +}\r
+> +\r
+> /* Callback to generate terms for each mime part of a message. */\r
+> static void\r
+> _index_mime_part (notmuch_message_t *message,\r
+> @@ -425,14 +466,15 @@ _index_mime_part (notmuch_message_t *message,\r
+>=20=20\r
+> notmuch_status_t\r
+> _notmuch_message_index_file (notmuch_message_t *message,\r
+> - const char *filename)\r
+> + const char *filename,\r
+> + notmuch_bool_t duplicate)\r
+> {\r
+> GMimeStream *stream =3D NULL;\r
+> GMimeParser *parser =3D NULL;\r
+> GMimeMessage *mime_message =3D NULL;\r
+> InternetAddressList *addresses;\r
+> FILE *file =3D NULL;\r
+> - const char *from, *subject;\r
+> + const char *from, *subject, *list_id;\r
+> notmuch_status_t ret =3D NOTMUCH_STATUS_SUCCESS;\r
+> static int initialized =3D 0;\r
+> char from_buf[5];\r
+> @@ -485,6 +527,9 @@ mboxes is deprecated and may be removed in the future=\r
+.\n", filename);\r
+>=20=20\r
+> from =3D g_mime_message_get_sender (mime_message);\r
+>=20=20\r
+> + if (duplicate)\r
+> + goto DUP;\r
+> +\r
+> addresses =3D internet_address_list_parse_string (from);\r
+> if (addresses) {\r
+> _index_address_list (message, "from", addresses);\r
+> @@ -502,6 +547,10 @@ mboxes is deprecated and may be removed in the futur=\r
+e.\n", filename);\r
+>=20=20\r
+> _index_mime_part (message, g_mime_message_get_mime_part (mime_messag=\r
+e));\r
+>=20=20\r
+> + DUP:\r
+> + list_id =3D g_mime_object_get_header (GMIME_OBJECT (mime_message), "=\r
+List-Id");\r
+> + _index_list_id (message, list_id);\r
+> +\r
+> DONE:\r
+> if (mime_message)\r
+> g_object_unref (mime_message);\r
+> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h\r
+> index af185c7c5ba8..138dfa58efc8 100644\r
+> --- a/lib/notmuch-private.h\r
+> +++ b/lib/notmuch-private.h\r
+> @@ -322,7 +322,8 @@ notmuch_message_get_author (notmuch_message_t *messag=\r
+e);\r
+>=20=20\r
+> notmuch_status_t\r
+> _notmuch_message_index_file (notmuch_message_t *message,\r
+> - const char *filename);\r
+> + const char *filename,\r
+> + notmuch_bool_t duplicate);\r
+>=20=20\r
+> /* message-file.c */\r
+>=20=20\r
+> diff --git a/man/man7/notmuch-search-terms.7 b/man/man7/notmuch-search-te=\r
+rms.7\r
+> index f1627b3488f8..29b30b7b0b00 100644\r
+> --- a/man/man7/notmuch-search-terms.7\r
+> +++ b/man/man7/notmuch-search-terms.7\r
+> @@ -52,6 +52,8 @@ terms to match against specific portions of an email, (=\r
+where\r
+>=20=20\r
+> thread:<thread-id>\r
+>=20=20\r
+> + list:<list-id>\r
+> +\r
+> folder:<directory-path>\r
+>=20=20\r
+> date:<since>..<until>\r
+> @@ -109,6 +111,12 @@ within a matching directory. Only the directory comp=\r
+onents below the\r
+> top-level mail database path are available to be searched.\r
+>=20=20\r
+> The\r
+> +.BR list: ,\r
+> +is used to match mailing list ID of an email message \- contents of the\r
+> +List\-Id: header without the '<', '>' delimiters or decoded list\r
+> +description.\r
+> +\r
+> +The\r
+> .B date:\r
+> prefix can be used to restrict the results to only messages within a\r
+> particular time range (based on the Date: header) with a range syntax\r
+> diff --git a/test/corpus/cur/18:2, b/test/corpus/cur/18:2,\r
+> index f522f69eb933..2b54925bd5d1 100644\r
+> --- a/test/corpus/cur/18:2,\r
+> +++ b/test/corpus/cur/18:2,\r
+> @@ -3,6 +3,7 @@ To: notmuch@notmuchmail.org\r
+> Date: Tue, 17 Nov 2009 18:21:38 -0500\r
+> Subject: [notmuch] archive\r
+> Message-ID: <20091117232137.GA7669@griffis1.net>\r
+> +List-Id: <test1.example.com>\r
+>=20=20\r
+> Just subscribed, I'd like to catch up on the previous postings,\r
+> but the archive link seems to be bogus?\r
+> diff --git a/test/corpus/cur/51:2, b/test/corpus/cur/51:2,\r
+> index f522f69eb933..b155e6ee64a5 100644\r
+> --- a/test/corpus/cur/51:2,\r
+> +++ b/test/corpus/cur/51:2,\r
+> @@ -3,6 +3,7 @@ To: notmuch@notmuchmail.org\r
+> Date: Tue, 17 Nov 2009 18:21:38 -0500\r
+> Subject: [notmuch] archive\r
+> Message-ID: <20091117232137.GA7669@griffis1.net>\r
+> +List-Id: <test2.example.com>\r
+>=20=20\r
+> Just subscribed, I'd like to catch up on the previous postings,\r
+> but the archive link seems to be bogus?\r
+> diff --git a/test/search b/test/search\r
+> index a7a0b18d2e48..bef42971226c 100755\r
+> --- a/test/search\r
+> +++ b/test/search\r
+> @@ -129,4 +129,28 @@ add_message '[subject]=3D"utf8-message-body-subject"=\r
+' '[date]=3D"Sat, 01 Jan 2000 12\r
+> output=3D$(notmuch search "b=C3=B6d=C3=BD" | notmuch_search_sanitize)\r
+> test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Notmuch Test =\r
+Suite; utf8-message-body-subject (inbox unread)"\r
+>=20=20\r
+> +test_begin_subtest "Search by List-Id"\r
+> +notmuch search list:notmuch.notmuchmail.org | notmuch_search_sanitize > =\r
+OUTPUT\r
+> +cat <<EOF >EXPECTED\r
+> +thread:XXX 2009-11-18 [2/2] Lars Kellogg-Stedman; [notmuch] "notmuch h=\r
+elp" outputs to stderr? (attachment inbox signed unread)\r
+> +thread:XXX 2009-11-18 [4/7] Lars Kellogg-Stedman, Mikhail Gusarov| Kei=\r
+th Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox sign=\r
+ed unread)\r
+> +thread:XXX 2009-11-18 [1/2] Alex Botero-Lowry| Carl Worth; [notmuch] [=\r
+PATCH] Error out if no query is supplied to search instead of going into an=\r
+ infinite loop (attachment inbox unread)\r
+> +thread:XXX 2009-11-17 [1/3] Adrian Perez de Castro| Keith Packard, Car=\r
+l Worth; [notmuch] Introducing myself (inbox signed unread)\r
+> +thread:XXX 2009-11-17 [1/2] Alex Botero-Lowry| Carl Worth; [notmuch] p=\r
+reliminary FreeBSD support (attachment inbox unread)\r
+> +EOF\r
+> +test_expect_equal_file OUTPUT EXPECTED\r
+> +\r
+> +test_begin_subtest "Search by List-Id, duplicated messages, step 1"\r
+> +notmuch search list:test1.example.com | notmuch_search_sanitize > OUTPUT\r
+> +cat <<EOF >EXPECTED\r
+> +thread:XXX 2009-11-17 [1/3] Aron Griffis| Keith Packard, Carl Worth; [=\r
+notmuch] archive (inbox unread)\r
+> +EOF\r
+> +test_expect_equal_file OUTPUT EXPECTED\r
+> +\r
+> +test_begin_subtest "Search by List-Id, duplicated messages, step 2"\r
+> +notmuch search list:test2.example.com | notmuch_search_sanitize > OUTPUT\r
+> +cat <<EOF >EXPECTED\r
+> +thread:XXX 2009-11-17 [1/3] Aron Griffis| Keith Packard, Carl Worth; [=\r
+notmuch] archive (inbox unread)\r
+> +EOF\r
+> +test_expect_equal_file OUTPUT EXPECTED\r
+> test_done\r
+> diff --git a/test/test-lib.sh b/test/test-lib.sh\r
+> index d8e0d9115a69..981bde4a4004 100644\r
+> --- a/test/test-lib.sh\r
+> +++ b/test/test-lib.sh\r
+> @@ -576,9 +576,9 @@ test_expect_equal_json () {\r
+> # The test suite forces LC_ALL=3DC, but this causes Python 3 to\r
+> # decode stdin as ASCII. We need to read JSON in UTF-8, so\r
+> # override Python's stdio encoding defaults.\r
+> - output=3D$(echo "$1" | PYTHONIOENCODING=3Dutf-8 python -mjson.tool \\r
+> + output=3D$(echo "$1" | PYTHONIOENCODING=3Dutf-8 python2 -mjson.tool \\r
+> || echo "$1")\r
+> - expected=3D$(echo "$2" | PYTHONIOENCODING=3Dutf-8 python -mjson.tool=\r
+ \\r
+> + expected=3D$(echo "$2" | PYTHONIOENCODING=3Dutf-8 python2 -mjson.too=\r
+l \\r
+> || echo "$2")\r
+> shift 2\r
+> test_expect_equal "$output" "$expected" "$@"\r
+> --=20\r
+> Kirill A. Shutemov\r