1 Return-Path: <pieter@praet.org>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 3EFB9431FD0
\r
6 for <notmuch@notmuchmail.org>; Thu, 10 Nov 2011 17:34:34 -0800 (PST)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5
\r
12 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled
\r
13 Received: from olra.theworths.org ([127.0.0.1])
\r
14 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
15 with ESMTP id eFAtHlloWcft for <notmuch@notmuchmail.org>;
\r
16 Thu, 10 Nov 2011 17:34:33 -0800 (PST)
\r
17 Received: from mail-wy0-f181.google.com (mail-wy0-f181.google.com
\r
18 [74.125.82.181]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
\r
19 (No client certificate requested)
\r
20 by olra.theworths.org (Postfix) with ESMTPS id 20915431FB6
\r
21 for <notmuch@notmuchmail.org>; Thu, 10 Nov 2011 17:34:33 -0800 (PST)
\r
22 Received: by wyg8 with SMTP id 8so3761231wyg.26
\r
23 for <notmuch@notmuchmail.org>; Thu, 10 Nov 2011 17:34:32 -0800 (PST)
\r
24 Received: by 10.180.81.73 with SMTP id y9mr11590030wix.37.1320975271818;
\r
25 Thu, 10 Nov 2011 17:34:31 -0800 (PST)
\r
26 Received: from localhost (26.48-242-81.adsl-dyn.isp.belgacom.be.
\r
28 by mx.google.com with ESMTPS id co5sm5987687wib.8.2011.11.10.17.34.30
\r
29 (version=TLSv1/SSLv3 cipher=OTHER);
\r
30 Thu, 10 Nov 2011 17:34:31 -0800 (PST)
\r
31 From: Pieter Praet <pieter@praet.org>
\r
32 To: Austin Clements <amdragon@MIT.EDU>, notmuch@notmuchmail.org
\r
33 Subject: Re: [PATCH] Store "from" and "subject" headers in the database.
\r
34 In-Reply-To: <1320599856-24078-1-git-send-email-amdragon@mit.edu>
\r
35 References: <1320599856-24078-1-git-send-email-amdragon@mit.edu>
\r
36 User-Agent: Notmuch/0.9+76~g2fd88e6 (http://notmuchmail.org) Emacs/23.3.1
\r
37 (x86_64-unknown-linux-gnu)
\r
38 Date: Fri, 11 Nov 2011 02:33:38 +0100
\r
39 Message-ID: <87obwjtpcd.fsf@praet.org>
\r
41 Content-Type: text/plain; charset=us-ascii
\r
42 Cc: notmuch@kismala.com
\r
43 X-BeenThere: notmuch@notmuchmail.org
\r
44 X-Mailman-Version: 2.1.13
\r
46 List-Id: "Use and development of the notmuch mail system."
\r
47 <notmuch.notmuchmail.org>
\r
48 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
49 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
50 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
51 List-Post: <mailto:notmuch@notmuchmail.org>
\r
52 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
53 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
54 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
55 X-List-Received-Date: Fri, 11 Nov 2011 01:34:34 -0000
\r
57 On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
\r
58 > This is a rebase and cleanup of Istvan Marko's patch from
\r
59 > id:m3pqnj2j7a.fsf@zsu.kismala.com
\r
62 Fantastic performance improvement Austin! This should be merged in ASAP.
\r
64 BTW, compacting the db from time to time also has a significant impact:
\r
68 $ sync && sudo /sbin/sysctl vm.drop_caches=3
\r
69 $ time notmuch search "*" | wc -l
\r
72 1 - original database, compacted some time ago
\r
73 2 - fresh database generated before patching, non-compacted
\r
74 3 - fresh database generated after patching, non-compacted
\r
75 4 - fresh database generated after patching, compacted with
\r
76 $ mv .notmuch/xapian .notmuch/xapian-fat
\r
77 $ xapian-compact --no-renumber .notmuch/xapian-fat .notmuch/xapian
\r
80 | db | 1 | 2 | 3 | 4 |
\r
81 |---------+-----------+----------+-----------+-----------|
\r
82 | db size | 272M | 289M | 291M | 172M |
\r
83 | amount | 9536 | 9540 | 9540 | 9540 |
\r
84 |---------+-----------+----------+-----------+-----------|
\r
85 | real | 1m42.221s | 2m3.193s | 0m30.762s | 0m10.505s |
\r
86 | user | 0m8.379s | 0m8.133s | 0m4.043s | 0m3.353s |
\r
87 | sys | 0m5.216s | 0m4.933s | 0m1.530s | 0m1.000s |
\r
90 > Search retrieves these headers for every message in the search
\r
91 > results. Previously, this required opening and parsing every message
\r
92 > file. Storing them directly in the database significantly reduces IO
\r
93 > and computation, speeding up search by between 50% and 10X.
\r
95 > Taking full advantage of this requires a database rebuild, but it will
\r
96 > fall back to the old behavior for messages that do not have headers
\r
97 > stored in the database.
\r
99 > lib/database.cc | 2 +-
\r
100 > lib/message.cc | 23 +++++++++++++++++++++--
\r
101 > lib/notmuch-private.h | 11 +++++++----
\r
102 > 3 files changed, 29 insertions(+), 7 deletions(-)
\r
104 > diff --git a/lib/database.cc b/lib/database.cc
\r
105 > index fa632f8..e4ef14e 100644
\r
106 > --- a/lib/database.cc
\r
107 > +++ b/lib/database.cc
\r
108 > @@ -1725,7 +1725,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
\r
111 > date = notmuch_message_file_get_header (message_file, "date");
\r
112 > - _notmuch_message_set_date (message, date);
\r
113 > + _notmuch_message_set_header_values (message, date, from, subject);
\r
115 > _notmuch_message_index_file (message, filename);
\r
117 > diff --git a/lib/message.cc b/lib/message.cc
\r
118 > index 8f22e02..ca7fbf2 100644
\r
119 > --- a/lib/message.cc
\r
120 > +++ b/lib/message.cc
\r
121 > @@ -412,6 +412,21 @@ _notmuch_message_ensure_message_file (notmuch_message_t *message)
\r
123 > notmuch_message_get_header (notmuch_message_t *message, const char *header)
\r
125 > + std::string value;
\r
127 > + /* Fetch header from the appropriate xapian value field if
\r
129 > + if (strcasecmp (header, "from") == 0)
\r
130 > + value = message->doc.get_value (NOTMUCH_VALUE_FROM);
\r
131 > + else if (strcasecmp (header, "subject") == 0)
\r
132 > + value = message->doc.get_value (NOTMUCH_VALUE_SUBJECT);
\r
133 > + else if (strcasecmp (header, "message-id") == 0)
\r
134 > + value = message->doc.get_value (NOTMUCH_VALUE_MESSAGE_ID);
\r
136 > + if (!value.empty())
\r
137 > + return talloc_strdup (message, value.c_str ());
\r
139 > + /* Otherwise fall back to parsing the file */
\r
140 > _notmuch_message_ensure_message_file (message);
\r
141 > if (message->message_file == NULL)
\r
143 > @@ -795,8 +810,10 @@ notmuch_message_set_author (notmuch_message_t *message,
\r
147 > -_notmuch_message_set_date (notmuch_message_t *message,
\r
148 > - const char *date)
\r
149 > +_notmuch_message_set_header_values (notmuch_message_t *message,
\r
150 > + const char *date,
\r
151 > + const char *from,
\r
152 > + const char *subject)
\r
154 > time_t time_value;
\r
156 > @@ -809,6 +826,8 @@ _notmuch_message_set_date (notmuch_message_t *message,
\r
158 > message->doc.add_value (NOTMUCH_VALUE_TIMESTAMP,
\r
159 > Xapian::sortable_serialise (time_value));
\r
160 > + message->doc.add_value (NOTMUCH_VALUE_FROM, from);
\r
161 > + message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
\r
164 > /* Synchronize changes made to message->doc out into the database. */
\r
165 > diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
\r
166 > index 0d3cc27..60a932f 100644
\r
167 > --- a/lib/notmuch-private.h
\r
168 > +++ b/lib/notmuch-private.h
\r
169 > @@ -93,7 +93,9 @@ NOTMUCH_BEGIN_DECLS
\r
172 > NOTMUCH_VALUE_TIMESTAMP = 0,
\r
173 > - NOTMUCH_VALUE_MESSAGE_ID
\r
174 > + NOTMUCH_VALUE_MESSAGE_ID,
\r
175 > + NOTMUCH_VALUE_FROM,
\r
176 > + NOTMUCH_VALUE_SUBJECT
\r
177 > } notmuch_value_t;
\r
179 > /* Xapian (with flint backend) complains if we provide a term longer
\r
180 > @@ -269,9 +271,10 @@ void
\r
181 > _notmuch_message_ensure_thread_id (notmuch_message_t *message);
\r
184 > -_notmuch_message_set_date (notmuch_message_t *message,
\r
185 > - const char *date);
\r
187 > +_notmuch_message_set_header_values (notmuch_message_t *message,
\r
188 > + const char *date,
\r
189 > + const char *from,
\r
190 > + const char *subject);
\r
192 > _notmuch_message_sync (notmuch_message_t *message);
\r
197 > _______________________________________________
\r
198 > notmuch mailing list
\r
199 > notmuch@notmuchmail.org
\r
200 > http://notmuchmail.org/mailman/listinfo/notmuch
\r