1 Return-Path: <amdragon@mit.edu>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 6A2BB431FAF
\r
6 for <notmuch@notmuchmail.org>; Wed, 9 Oct 2013 07:37:08 -0700 (PDT)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5
\r
12 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled
\r
13 Received: from olra.theworths.org ([127.0.0.1])
\r
14 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
15 with ESMTP id 6DwIkIDjBmao for <notmuch@notmuchmail.org>;
\r
16 Wed, 9 Oct 2013 07:37:02 -0700 (PDT)
\r
17 Received: from dmz-mailsec-scanner-8.mit.edu (dmz-mailsec-scanner-8.mit.edu
\r
19 by olra.theworths.org (Postfix) with ESMTP id 8EDE8431FAE
\r
20 for <notmuch@notmuchmail.org>; Wed, 9 Oct 2013 07:37:02 -0700 (PDT)
\r
21 X-AuditID: 12074425-b7f1c8e0000009c7-f5-52556a0e8760
\r
22 Received: from mailhub-auth-4.mit.edu ( [18.7.62.39])
\r
23 by dmz-mailsec-scanner-8.mit.edu (Symantec Messaging Gateway) with SMTP
\r
24 id AF.CC.02503.E0A65525; Wed, 9 Oct 2013 10:37:02 -0400 (EDT)
\r
25 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11])
\r
26 by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id r99Eb1X6006429;
\r
27 Wed, 9 Oct 2013 10:37:01 -0400
\r
28 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91])
\r
29 (authenticated bits=0)
\r
30 (User authenticated as amdragon@ATHENA.MIT.EDU)
\r
31 by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id r99EaxWn031902
\r
32 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT);
\r
33 Wed, 9 Oct 2013 10:37:00 -0400
\r
34 Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80)
\r
35 (envelope-from <amdragon@mit.edu>)
\r
36 id 1VTusk-000584-RI; Wed, 09 Oct 2013 10:36:58 -0400
\r
37 Date: Wed, 9 Oct 2013 10:36:58 -0400
\r
38 From: Austin Clements <amdragon@MIT.EDU>
\r
39 To: Jani Nikula <jani@nikula.org>
\r
40 Subject: Re: [PATCH 08/11] search: Add stable queries to thread search results
\r
41 Message-ID: <20131009143658.GQ21611@mit.edu>
\r
42 References: <1381185201-25197-1-git-send-email-amdragon@mit.edu>
\r
43 <1381185201-25197-9-git-send-email-amdragon@mit.edu>
\r
44 <87fvsaao2q.fsf@nikula.org>
\r
46 Content-Type: text/plain; charset=iso-8859-1
\r
47 Content-Disposition: inline
\r
48 Content-Transfer-Encoding: 8bit
\r
49 In-Reply-To: <87fvsaao2q.fsf@nikula.org>
\r
50 User-Agent: Mutt/1.5.21 (2010-09-15)
\r
51 X-Brightmail-Tracker:
\r
52 H4sIAAAAAAAAA+NgFprJKsWRmVeSWpSXmKPExsUixG6nrsuXFRpk8OIJi0XTdGeL6zdnMjsw
\r
53 edy6/5rd49mqW8wBTFFcNimpOZllqUX6dglcGSePGRTsiKy4d+QRSwNjg2sXIyeHhICJxNpt
\r
54 v5ghbDGJC/fWs4HYQgL7GCUm3czuYuQCsjcwSszedYYRwjnFJPFl9ldmiKoljBIL+wVBbBYB
\r
55 FYnN/z+BxdkENCS27V/OCGKLCChKbD65H8xmFpCW+Pa7mQnEFhbwk5i8qIEFxOYV0JH4NWMT
\r
56 E8SCqYwSew6uZIRICEqcnPmEBaJZR2Ln1jtA53GADVr+jwMiLC/RvHU22F5OoL3952+DzRcF
\r
57 umfKyW1sExiFZyGZNAvJpFkIk2YhmbSAkWUVo2xKbpVubmJmTnFqsm5xcmJeXmqRroVebmaJ
\r
58 XmpK6SZGcAy4qO5gnHBI6RCjAAejEg9vB29IkBBrYllxZe4hRkkOJiVR3u0xoUFCfEn5KZUZ
\r
59 icUZ8UWlOanFhxglOJiVRHiTUoByvCmJlVWpRfkwKWkOFiVx3lsc9kFCAumJJanZqakFqUUw
\r
60 WRkODiUJXpYMoEbBotT01Iq0zJwShDQTByfIcB6g4TGZIMOLCxJzizPTIfKnGBWlxHk/gzQL
\r
61 gCQySvPgemEp6hWjONArwhDtPMD0Btf9CmgwE9Dg7d9DQAaXJCKkpBoYZz90c3orc/dRBvdq
\r
62 tvUHp5UdyLcpc9YQXWNdtYn//M1dLdd/1nfWzF6Xd0ju5HSWXyvusa0rXMOb8dmj8tOHAKPP
\r
63 8hwfr7PKVhdEBCz6fWmFlvf3sr7Cw0l+zzIuMMzgNTosVGxWENegcsTF4O0sh4DzXaXfHk0r
\r
64 KIqqffenYUVPgvadCvNaJZbijERDLeai4kQACtu99iwDAAA=
\r
65 Cc: notmuch@notmuchmail.org
\r
66 X-BeenThere: notmuch@notmuchmail.org
\r
67 X-Mailman-Version: 2.1.13
\r
69 List-Id: "Use and development of the notmuch mail system."
\r
70 <notmuch.notmuchmail.org>
\r
71 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
72 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
73 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
74 List-Post: <mailto:notmuch@notmuchmail.org>
\r
75 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
76 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
77 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
78 X-List-Received-Date: Wed, 09 Oct 2013 14:37:08 -0000
\r
80 Quoth Jani Nikula on Oct 09 at 9:41 am:
\r
81 > On Tue, 08 Oct 2013, Austin Clements <amdragon@MIT.EDU> wrote:
\r
82 > > These queries will match exactly the set of messages currently in the
\r
83 > > thread, even if more messages later arrive. Two queries are provided:
\r
84 > > one for matched messages and one for unmatched messages.
\r
86 > > This can be used to fix race conditions with tagging threads from
\r
87 > > search results. While tagging based on a thread: query can affect
\r
88 > > messages that arrived after the search, tagging based on stable
\r
89 > > queries affects only the messages the user was shown in the search UI.
\r
91 > > Since we want clients to be able to depend on the presence of these
\r
92 > > queries, this ushers in schema version 2.
\r
94 > > devel/schemata | 22 +++++++++++++++++--
\r
95 > > notmuch-client.h | 2 +-
\r
96 > > notmuch-search.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++
\r
97 > > test/json | 2 ++
\r
98 > > test/missing-headers | 6 ++++--
\r
99 > > test/sexp | 4 ++--
\r
100 > > 6 files changed, 89 insertions(+), 7 deletions(-)
\r
102 > > diff --git a/devel/schemata b/devel/schemata
\r
103 > > index cdd0e43..41dc4a6 100644
\r
104 > > --- a/devel/schemata
\r
105 > > +++ b/devel/schemata
\r
106 > > @@ -14,7 +14,17 @@ are interleaved. Keys are printed as keywords (symbols preceded by a
\r
107 > > colon), e.g. (:id "123" :time 54321 :from "foobar"). Null is printed as
\r
108 > > nil, true as t and false as nil.
\r
110 > > -This is version 1 of the structured output format.
\r
111 > > +This is version 2 of the structured output format.
\r
113 > > +Version history
\r
114 > > +---------------
\r
117 > > +- First versioned schema release.
\r
118 > > +- Added part.content-length and part.content-transfer-encoding fields.
\r
121 > > +- Added the thread_summary.query field.
\r
123 > > Common non-terminals
\r
124 > > --------------------
\r
125 > > @@ -145,7 +155,15 @@ thread_summary = {
\r
126 > > authors: string, # comma-separated names with | between
\r
127 > > # matched and unmatched
\r
128 > > subject: string,
\r
129 > > - tags: [string*]
\r
130 > > + tags: [string*],
\r
132 > > + # Two stable query strings identifying exactly the matched and
\r
133 > > + # unmatched messages currently in this thread. The messages
\r
134 > > + # matched by these queries will not change even if more messages
\r
135 > > + # arrive in the thread. If there are no matched or unmatched
\r
136 > > + # messages, the corresponding query will be null (there is no
\r
137 > > + # query that matches nothing). (Added in schema version 2.)
\r
138 > > + query: [string|null, string|null],
\r
141 > > notmuch reply schema
\r
142 > > diff --git a/notmuch-client.h b/notmuch-client.h
\r
143 > > index 8d986f4..1b14910 100644
\r
144 > > --- a/notmuch-client.h
\r
145 > > +++ b/notmuch-client.h
\r
146 > > @@ -138,7 +138,7 @@ chomp_newline (char *str)
\r
147 > > * this. New (required) map fields can be added without increasing
\r
150 > > -#define NOTMUCH_FORMAT_CUR 1
\r
151 > > +#define NOTMUCH_FORMAT_CUR 2
\r
152 > > /* The minimum supported structured output format version. Requests
\r
153 > > * for format versions below this will return an error. */
\r
154 > > #define NOTMUCH_FORMAT_MIN 1
\r
155 > > diff --git a/notmuch-search.c b/notmuch-search.c
\r
156 > > index d9d39ec..1d14651 100644
\r
157 > > --- a/notmuch-search.c
\r
158 > > +++ b/notmuch-search.c
\r
159 > > @@ -20,6 +20,7 @@
\r
161 > > #include "notmuch-client.h"
\r
162 > > #include "sprinter.h"
\r
163 > > +#include "string-util.h"
\r
166 > > OUTPUT_SUMMARY,
\r
167 > > @@ -46,6 +47,46 @@ sanitize_string (const void *ctx, const char *str)
\r
171 > > +/* Return two stable query strings that identify exactly the matched
\r
172 > > + * and unmatched messages currently in thread. If there are no
\r
173 > > + * matched or unmatched messages, the returned buffers will be
\r
176 > > +get_thread_query (notmuch_thread_t *thread,
\r
177 > > + char **matched_out, char **unmached_out)
\r
179 > > + notmuch_messages_t *messages;
\r
180 > > + char *escaped = NULL;
\r
181 > > + size_t escaped_len = 0;
\r
183 > > + *matched_out = *unmached_out = NULL;
\r
185 > > + for (messages = notmuch_thread_get_messages (thread);
\r
186 > > + notmuch_messages_valid (messages);
\r
187 > > + notmuch_messages_move_to_next (messages))
\r
189 > > + notmuch_message_t *message = notmuch_messages_get (messages);
\r
190 > > + const char *mid = notmuch_message_get_message_id (message);
\r
191 > > + /* Determine which query buffer to extend */
\r
192 > > + char **buf = notmuch_message_get_flag (
\r
193 > > + message, NOTMUCH_MESSAGE_FLAG_MATCH) ? matched_out : unmached_out;
\r
194 > > + /* Allocate the query buffer is this is the first message */
\r
195 > > + if (!*buf && (*buf = talloc_strdup (thread, "")) == NULL)
\r
198 > I think it would improve clarity if you dropped the above...
\r
200 > > + /* Add this message's id: query. Since "id" is an exclusive
\r
201 > > + * prefix, it is implicitly 'or'd together, so we only need to
\r
202 > > + * join queries with a space. */
\r
203 > > + if (make_boolean_term (thread, "id", mid, &escaped, &escaped_len) < 0)
\r
205 > > + *buf = talloc_asprintf_append_buffer (
\r
206 > > + *buf, "%s%s", **buf ? " " : "", escaped);
\r
208 > ...and turned this into:
\r
211 > *buf = talloc_asprintf_append_buffer (*buf, " %s", escaped);
\r
213 > *buf = talloc_strdup (thread, escaped);
\r
217 > Also one talloc less. Which brings me to the main worry:
\r
218 > performance. What's the impact?
\r
220 Seems to be about 1%-3% for CLI search (tested on the medium corpus).
\r
221 It's hard to measure what the effect on Emacs search is, though I
\r
222 would expect it to be similarly negligible. Some work I did several
\r
223 attempts at this ago suggests that this slows down tagging (though I
\r
224 doubt it would be noticeable for single threads), but I also found
\r
225 that switching to docid-based queries significantly sped things up:
\r
226 id:CAH-f9WsPj=1Eu=g3sOePJgCTBFs6HrLdLq18xMEnJ8aZ00yCEg@mail.gmail.com
\r
227 Actually, docid queries probably make tagging faster than it is *now*,
\r
228 but I didn't measure that when I did the experiments.
\r
237 > > + talloc_free (escaped);
\r
242 > > do_search_threads (sprinter_t *format,
\r
243 > > notmuch_query_t *query,
\r
244 > > @@ -131,6 +172,25 @@ do_search_threads (sprinter_t *format,
\r
245 > > format->string (format, authors);
\r
246 > > format->map_key (format, "subject");
\r
247 > > format->string (format, subject);
\r
248 > > + if (notmuch_format_version >= 2) {
\r
249 > > + char *matched_query, *unmatched_query;
\r
250 > > + if (get_thread_query (thread, &matched_query,
\r
251 > > + &unmatched_query) < 0) {
\r
252 > > + fprintf (stderr, "Out of memory\n");
\r
255 > > + format->map_key (format, "query");
\r
256 > > + format->begin_list (format);
\r
257 > > + if (matched_query)
\r
258 > > + format->string (format, matched_query);
\r
260 > > + format->null (format);
\r
261 > > + if (unmatched_query)
\r
262 > > + format->string (format, unmatched_query);
\r
264 > > + format->null (format);
\r
265 > > + format->end (format);
\r
269 > > talloc_free (ctx_quote);
\r
270 > > diff --git a/test/json b/test/json
\r
271 > > index b87b7f6..e07a290 100755
\r
272 > > --- a/test/json
\r
273 > > +++ b/test/json
\r
274 > > @@ -26,6 +26,7 @@ test_expect_equal_json "$output" "[{\"thread\": \"XXX\",
\r
276 > > \"authors\": \"Notmuch Test Suite\",
\r
277 > > \"subject\": \"json-search-subject\",
\r
278 > > + \"query\": [\"id:$gen_msg_id\", null],
\r
279 > > \"tags\": [\"inbox\",
\r
282 > > @@ -59,6 +60,7 @@ test_expect_equal_json "$output" "[{\"thread\": \"XXX\",
\r
284 > > \"authors\": \"Notmuch Test Suite\",
\r
285 > > \"subject\": \"json-search-utf8-body-sübjéct\",
\r
286 > > + \"query\": [\"id:$gen_msg_id\", null],
\r
287 > > \"tags\": [\"inbox\",
\r
290 > > diff --git a/test/missing-headers b/test/missing-headers
\r
291 > > index f14b878..43e861b 100755
\r
292 > > --- a/test/missing-headers
\r
293 > > +++ b/test/missing-headers
\r
294 > > @@ -43,7 +43,8 @@ test_expect_equal_json "$output" '
\r
296 > > "thread": "XXX",
\r
297 > > "timestamp": 978709437,
\r
300 > > + "query": ["id:notmuch-sha1-7a6e4eac383ef958fcd3ebf2143db71b8ff01161", null]
\r
303 > > "authors": "Notmuch Test Suite",
\r
304 > > @@ -56,7 +57,8 @@ test_expect_equal_json "$output" '
\r
306 > > "thread": "XXX",
\r
307 > > "timestamp": 0,
\r
310 > > + "query": ["id:notmuch-sha1-ca55943aff7a72baf2ab21fa74fab3d632401334", null]
\r
314 > > diff --git a/test/sexp b/test/sexp
\r
315 > > index 492a82f..be815e1 100755
\r
316 > > --- a/test/sexp
\r
317 > > +++ b/test/sexp
\r
318 > > @@ -19,7 +19,7 @@ test_expect_equal "$output" "((((:id \"${gen_msg_id}\" :match t :excluded nil :f
\r
319 > > test_begin_subtest "Search message: sexp"
\r
320 > > add_message "[subject]=\"sexp-search-subject\"" "[date]=\"Sat, 01 Jan 2000 12:00:00 -0000\"" "[body]=\"sexp-search-message\""
\r
321 > > output=$(notmuch search --format=sexp "sexp-search-message" | notmuch_search_sanitize)
\r
322 > > -test_expect_equal "$output" "((:thread \"0000000000000002\" :timestamp 946728000 :date_relative \"2000-01-01\" :matched 1 :total 1 :authors \"Notmuch Test Suite\" :subject \"sexp-search-subject\" :tags (\"inbox\" \"unread\")))"
\r
323 > > +test_expect_equal "$output" "((:thread \"0000000000000002\" :timestamp 946728000 :date_relative \"2000-01-01\" :matched 1 :total 1 :authors \"Notmuch Test Suite\" :subject \"sexp-search-subject\" :query (\"id:$gen_msg_id\" nil) :tags (\"inbox\" \"unread\")))"
\r
325 > > test_begin_subtest "Show message: sexp, utf-8"
\r
326 > > add_message "[subject]=\"sexp-show-utf8-body-sübjéct\"" "[date]=\"Sat, 01 Jan 2000 12:00:00 -0000\"" "[body]=\"jsön-show-méssage\""
\r
327 > > @@ -44,7 +44,7 @@ test_expect_equal "$output" "((((:id \"$id\" :match t :excluded nil :filename \"
\r
328 > > test_begin_subtest "Search message: sexp, utf-8"
\r
329 > > add_message "[subject]=\"sexp-search-utf8-body-sübjéct\"" "[date]=\"Sat, 01 Jan 2000 12:00:00 -0000\"" "[body]=\"jsön-search-méssage\""
\r
330 > > output=$(notmuch search --format=sexp "jsön-search-méssage" | notmuch_search_sanitize)
\r
331 > > -test_expect_equal "$output" "((:thread \"0000000000000005\" :timestamp 946728000 :date_relative \"2000-01-01\" :matched 1 :total 1 :authors \"Notmuch Test Suite\" :subject \"sexp-search-utf8-body-sübjéct\" :tags (\"inbox\" \"unread\")))"
\r
332 > > +test_expect_equal "$output" "((:thread \"0000000000000005\" :timestamp 946728000 :date_relative \"2000-01-01\" :matched 1 :total 1 :authors \"Notmuch Test Suite\" :subject \"sexp-search-utf8-body-sübjéct\" :query (\"id:$gen_msg_id\" nil) :tags (\"inbox\" \"unread\")))"
\r
337 > > _______________________________________________
\r
338 > > notmuch mailing list
\r
339 > > notmuch@notmuchmail.org
\r
340 > > http://notmuchmail.org/mailman/listinfo/notmuch
\r
343 Austin Clements MIT/'06/PhD/CSAIL
\r
344 amdragon@mit.edu http://web.mit.edu/amdragon
\r
345 Somewhere in the dream we call reality you will find me,
\r
346 searching for the reality we call dreams.
\r