1 Return-Path: <kas@node.shutemov.name>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 1AE76431FC3
\r
6 for <notmuch@notmuchmail.org>; Tue, 17 Dec 2013 10:10:07 -0800 (PST)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=0 tagged_above=-999 required=5
\r
12 tests=[RCVD_IN_DNSWL_NONE=-0.0001] autolearn=disabled
\r
13 Received: from olra.theworths.org ([127.0.0.1])
\r
14 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
15 with ESMTP id IuP8oTvzApcf for <notmuch@notmuchmail.org>;
\r
16 Tue, 17 Dec 2013 10:09:59 -0800 (PST)
\r
17 X-Greylist: delayed 383 seconds by postgrey-1.32 at olra;
\r
18 Tue, 17 Dec 2013 10:09:58 PST
\r
19 Received: from jenni2.inet.fi (mta-out.inet.fi [195.156.147.13])
\r
20 by olra.theworths.org (Postfix) with ESMTP id E6CB3431FBF
\r
21 for <notmuch@notmuchmail.org>; Tue, 17 Dec 2013 10:09:58 -0800 (PST)
\r
22 Received: from node.shutemov.name (80.220.224.16) by jenni2.inet.fi
\r
23 (8.5.140.03) id 52775C9903BE6DDE; Tue, 17 Dec 2013 20:03:26 +0200
\r
24 Received: by node.shutemov.name (Postfix, from userid 1000)
\r
25 id 29749417EE; Tue, 17 Dec 2013 20:03:23 +0200 (EET)
\r
26 Date: Tue, 17 Dec 2013 20:03:22 +0200
\r
27 From: "Kirill A. Shutemov" <kirill@shutemov.name>
\r
28 To: Jani Nikula <jani@nikula.org>
\r
29 Subject: Re: [PATCH] lib: Add a new prefix "list" to the search-terms syntax
\r
30 Message-ID: <20131217180322.GA9272@node.dhcp.inet.fi>
\r
31 References: <20130409083010.GA27675@raorn.name>
\r
32 <1365549369-12776-1-git-send-email-raorn@raorn.name>
\r
33 <87bo2ougmb.fsf@nikula.org>
\r
35 Content-Type: text/plain; charset=iso-8859-1
\r
36 Content-Disposition: inline
\r
37 Content-Transfer-Encoding: 8bit
\r
38 In-Reply-To: <87bo2ougmb.fsf@nikula.org>
\r
39 User-Agent: Mutt/1.5.22.1-rc1 (2013-10-16)
\r
40 Cc: notmuch@notmuchmail.org, "Alexey I. Froloff" <raorn@raorn.name>
\r
41 X-BeenThere: notmuch@notmuchmail.org
\r
42 X-Mailman-Version: 2.1.13
\r
44 List-Id: "Use and development of the notmuch mail system."
\r
45 <notmuch.notmuchmail.org>
\r
46 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
47 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
48 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
49 List-Post: <mailto:notmuch@notmuchmail.org>
\r
50 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
51 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
52 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
53 X-List-Received-Date: Tue, 17 Dec 2013 18:10:07 -0000
\r
55 On Thu, Oct 17, 2013 at 05:17:00PM +0300, Jani Nikula wrote:
\r
56 > On Wed, 10 Apr 2013, "Alexey I. Froloff" <raorn@raorn.name> wrote:
\r
57 > > From: "Alexey I. Froloff" <raorn@raorn.name>
\r
59 > > Add support for indexing and searching the message's List-Id header.
\r
60 > > This is useful when matching all the messages belonging to a particular
\r
63 > There's an issue with our duplicate message-id handling that is likely
\r
64 > to cause confusion with List-Id: searches. If you receive several
\r
65 > duplicates of the same message (judged by the message-id), only the
\r
66 > first one of them gets indexed, and the rest are ignored. This means
\r
67 > that for messages you receive both directly and through a list, it will
\r
68 > be arbitrary whether the List-Id: gets indexed or not. Therefore a list:
\r
69 > search might not return all the messages you'd expect.
\r
71 I've tried to address this. The patch also adds few tests for the feature.
\r
73 There's still missing functionality: re-indexing existing messages for
\r
74 list-id, handling message removal, etc.
\r
78 diff --git a/lib/database.cc b/lib/database.cc
\r
79 index f395061e3a73..196243e15d1a 100644
\r
80 --- a/lib/database.cc
\r
81 +++ b/lib/database.cc
\r
82 @@ -205,6 +205,7 @@ static prefix_t BOOLEAN_PREFIX_INTERNAL[] = {
\r
85 static prefix_t BOOLEAN_PREFIX_EXTERNAL[] = {
\r
86 + { "list", "XLIST"},
\r
90 @@ -2025,10 +2026,13 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
\r
91 date = notmuch_message_file_get_header (message_file, "date");
\r
92 _notmuch_message_set_header_values (message, date, from, subject);
\r
94 - ret = _notmuch_message_index_file (message, filename);
\r
95 + ret = _notmuch_message_index_file (message, filename, false);
\r
99 + ret = _notmuch_message_index_file (message, filename, true);
\r
102 ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
\r
105 diff --git a/lib/index.cc b/lib/index.cc
\r
106 index 78c18cf36d10..9fe1ad6502ed 100644
\r
109 @@ -304,6 +304,47 @@ _index_address_list (notmuch_message_t *message,
\r
114 +_index_list_id (notmuch_message_t *message,
\r
115 + const char *list_id_header)
\r
117 + const char *begin_list_id, *end_list_id, *list_id;
\r
120 + if (list_id_header == NULL)
\r
123 + /* RFC2919 says that the list-id is found at the end of the header
\r
124 + * and enclosed between angle brackets. If we cannot find a
\r
125 + * matching pair of brackets containing at least one character,
\r
126 + * we ignore the list id header. */
\r
127 + begin_list_id = strrchr (list_id_header, '<');
\r
128 + if (!begin_list_id) {
\r
129 + fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");
\r
133 + end_list_id = strrchr(begin_list_id, '>');
\r
134 + if (!end_list_id || (end_list_id - begin_list_id < 2)) {
\r
135 + fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");
\r
139 + local = talloc_new (message);
\r
141 + /* We extract the list id between the angle brackets */
\r
142 + list_id = talloc_strndup (local, begin_list_id + 1,
\r
143 + end_list_id - begin_list_id - 1);
\r
145 + /* _notmuch_message_add_term() may return
\r
146 + * NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG here. We can't fix it, but
\r
147 + * this is not a reason to exit with error... */
\r
148 + if (_notmuch_message_add_term (message, "list", list_id))
\r
149 + fprintf (stderr, "Warning: Not indexing List-Id: <%s>\n", list_id);
\r
151 + talloc_free (local);
\r
154 /* Callback to generate terms for each mime part of a message. */
\r
156 _index_mime_part (notmuch_message_t *message,
\r
157 @@ -425,14 +466,15 @@ _index_mime_part (notmuch_message_t *message,
\r
160 _notmuch_message_index_file (notmuch_message_t *message,
\r
161 - const char *filename)
\r
162 + const char *filename,
\r
163 + notmuch_bool_t duplicate)
\r
165 GMimeStream *stream = NULL;
\r
166 GMimeParser *parser = NULL;
\r
167 GMimeMessage *mime_message = NULL;
\r
168 InternetAddressList *addresses;
\r
170 - const char *from, *subject;
\r
171 + const char *from, *subject, *list_id;
\r
172 notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
\r
173 static int initialized = 0;
\r
175 @@ -485,6 +527,9 @@ mboxes is deprecated and may be removed in the future.\n", filename);
\r
177 from = g_mime_message_get_sender (mime_message);
\r
182 addresses = internet_address_list_parse_string (from);
\r
184 _index_address_list (message, "from", addresses);
\r
185 @@ -502,6 +547,10 @@ mboxes is deprecated and may be removed in the future.\n", filename);
\r
187 _index_mime_part (message, g_mime_message_get_mime_part (mime_message));
\r
190 + list_id = g_mime_object_get_header (GMIME_OBJECT (mime_message), "List-Id");
\r
191 + _index_list_id (message, list_id);
\r
195 g_object_unref (mime_message);
\r
196 diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
\r
197 index af185c7c5ba8..138dfa58efc8 100644
\r
198 --- a/lib/notmuch-private.h
\r
199 +++ b/lib/notmuch-private.h
\r
200 @@ -322,7 +322,8 @@ notmuch_message_get_author (notmuch_message_t *message);
\r
203 _notmuch_message_index_file (notmuch_message_t *message,
\r
204 - const char *filename);
\r
205 + const char *filename,
\r
206 + notmuch_bool_t duplicate);
\r
208 /* message-file.c */
\r
210 diff --git a/man/man7/notmuch-search-terms.7 b/man/man7/notmuch-search-terms.7
\r
211 index f1627b3488f8..29b30b7b0b00 100644
\r
212 --- a/man/man7/notmuch-search-terms.7
\r
213 +++ b/man/man7/notmuch-search-terms.7
\r
214 @@ -52,6 +52,8 @@ terms to match against specific portions of an email, (where
\r
220 folder:<directory-path>
\r
222 date:<since>..<until>
\r
223 @@ -109,6 +111,12 @@ within a matching directory. Only the directory components below the
\r
224 top-level mail database path are available to be searched.
\r
228 +is used to match mailing list ID of an email message \- contents of the
\r
229 +List\-Id: header without the '<', '>' delimiters or decoded list
\r
234 prefix can be used to restrict the results to only messages within a
\r
235 particular time range (based on the Date: header) with a range syntax
\r
236 diff --git a/test/corpus/cur/18:2, b/test/corpus/cur/18:2,
\r
237 index f522f69eb933..2b54925bd5d1 100644
\r
238 --- a/test/corpus/cur/18:2,
\r
239 +++ b/test/corpus/cur/18:2,
\r
240 @@ -3,6 +3,7 @@ To: notmuch@notmuchmail.org
\r
241 Date: Tue, 17 Nov 2009 18:21:38 -0500
\r
242 Subject: [notmuch] archive
\r
243 Message-ID: <20091117232137.GA7669@griffis1.net>
\r
244 +List-Id: <test1.example.com>
\r
246 Just subscribed, I'd like to catch up on the previous postings,
\r
247 but the archive link seems to be bogus?
\r
248 diff --git a/test/corpus/cur/51:2, b/test/corpus/cur/51:2,
\r
249 index f522f69eb933..b155e6ee64a5 100644
\r
250 --- a/test/corpus/cur/51:2,
\r
251 +++ b/test/corpus/cur/51:2,
\r
252 @@ -3,6 +3,7 @@ To: notmuch@notmuchmail.org
\r
253 Date: Tue, 17 Nov 2009 18:21:38 -0500
\r
254 Subject: [notmuch] archive
\r
255 Message-ID: <20091117232137.GA7669@griffis1.net>
\r
256 +List-Id: <test2.example.com>
\r
258 Just subscribed, I'd like to catch up on the previous postings,
\r
259 but the archive link seems to be bogus?
\r
260 diff --git a/test/search b/test/search
\r
261 index a7a0b18d2e48..bef42971226c 100755
\r
264 @@ -129,4 +129,28 @@ add_message '[subject]="utf8-message-body-subject"' '[date]="Sat, 01 Jan 2000 12
\r
265 output=$(notmuch search "bödý" | notmuch_search_sanitize)
\r
266 test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; utf8-message-body-subject (inbox unread)"
\r
268 +test_begin_subtest "Search by List-Id"
\r
269 +notmuch search list:notmuch.notmuchmail.org | notmuch_search_sanitize > OUTPUT
\r
270 +cat <<EOF >EXPECTED
\r
271 +thread:XXX 2009-11-18 [2/2] Lars Kellogg-Stedman; [notmuch] "notmuch help" outputs to stderr? (attachment inbox signed unread)
\r
272 +thread:XXX 2009-11-18 [4/7] Lars Kellogg-Stedman, Mikhail Gusarov| Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
\r
273 +thread:XXX 2009-11-18 [1/2] Alex Botero-Lowry| Carl Worth; [notmuch] [PATCH] Error out if no query is supplied to search instead of going into an infinite loop (attachment inbox unread)
\r
274 +thread:XXX 2009-11-17 [1/3] Adrian Perez de Castro| Keith Packard, Carl Worth; [notmuch] Introducing myself (inbox signed unread)
\r
275 +thread:XXX 2009-11-17 [1/2] Alex Botero-Lowry| Carl Worth; [notmuch] preliminary FreeBSD support (attachment inbox unread)
\r
277 +test_expect_equal_file OUTPUT EXPECTED
\r
279 +test_begin_subtest "Search by List-Id, duplicated messages, step 1"
\r
280 +notmuch search list:test1.example.com | notmuch_search_sanitize > OUTPUT
\r
281 +cat <<EOF >EXPECTED
\r
282 +thread:XXX 2009-11-17 [1/3] Aron Griffis| Keith Packard, Carl Worth; [notmuch] archive (inbox unread)
\r
284 +test_expect_equal_file OUTPUT EXPECTED
\r
286 +test_begin_subtest "Search by List-Id, duplicated messages, step 2"
\r
287 +notmuch search list:test2.example.com | notmuch_search_sanitize > OUTPUT
\r
288 +cat <<EOF >EXPECTED
\r
289 +thread:XXX 2009-11-17 [1/3] Aron Griffis| Keith Packard, Carl Worth; [notmuch] archive (inbox unread)
\r
291 +test_expect_equal_file OUTPUT EXPECTED
\r
293 diff --git a/test/test-lib.sh b/test/test-lib.sh
\r
294 index d8e0d9115a69..981bde4a4004 100644
\r
295 --- a/test/test-lib.sh
\r
296 +++ b/test/test-lib.sh
\r
297 @@ -576,9 +576,9 @@ test_expect_equal_json () {
\r
298 # The test suite forces LC_ALL=C, but this causes Python 3 to
\r
299 # decode stdin as ASCII. We need to read JSON in UTF-8, so
\r
300 # override Python's stdio encoding defaults.
\r
301 - output=$(echo "$1" | PYTHONIOENCODING=utf-8 python -mjson.tool \
\r
302 + output=$(echo "$1" | PYTHONIOENCODING=utf-8 python2 -mjson.tool \
\r
304 - expected=$(echo "$2" | PYTHONIOENCODING=utf-8 python -mjson.tool \
\r
305 + expected=$(echo "$2" | PYTHONIOENCODING=utf-8 python2 -mjson.tool \
\r
308 test_expect_equal "$output" "$expected" "$@"
\r