1 Return-Path: <amdragon@gmail.com>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 142D2429E28
\r
6 for <notmuch@notmuchmail.org>; Thu, 26 May 2011 19:41:47 -0700 (PDT)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=-0.699 tagged_above=-999 required=5
\r
12 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001,
\r
13 RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled
\r
14 Received: from olra.theworths.org ([127.0.0.1])
\r
15 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
16 with ESMTP id dxVihK4TAvu5 for <notmuch@notmuchmail.org>;
\r
17 Thu, 26 May 2011 19:41:45 -0700 (PDT)
\r
18 Received: from mail-qw0-f53.google.com (mail-qw0-f53.google.com
\r
19 [209.85.216.53]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
\r
20 (No client certificate requested)
\r
21 by olra.theworths.org (Postfix) with ESMTPS id 0B09B431FB6
\r
22 for <notmuch@notmuchmail.org>; Thu, 26 May 2011 19:41:44 -0700 (PDT)
\r
23 Received: by qwb7 with SMTP id 7so872055qwb.26
\r
24 for <notmuch@notmuchmail.org>; Thu, 26 May 2011 19:41:44 -0700 (PDT)
\r
25 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
\r
26 h=domainkey-signature:mime-version:sender:in-reply-to:references:date
\r
27 :x-google-sender-auth:message-id:subject:from:to:cc:content-type
\r
28 :content-transfer-encoding;
\r
29 bh=phq5UPcAoJLmMxW0VGIZtZFLGrie0BSNhwo4E6f7q+Q=;
\r
30 b=SxZfUTmEs9wMz62AJsxwe6OCrvMYzDXGcf7rYhjcUvZyVDQlwr1aipim1xxk12Dyvb
\r
31 bH93yQ3dmy2zIfv4K7VRgl9b2Xy4NmEfSAQeAewFoq/6+fdcxdtO+Iu/69G2WrISLzkj
\r
32 EshCH6bV5Dwh1/Lhy05DTHmjtn0WscbB7eagg=
\r
33 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
\r
34 h=mime-version:sender:in-reply-to:references:date
\r
35 :x-google-sender-auth:message-id:subject:from:to:cc:content-type
\r
36 :content-transfer-encoding;
\r
37 b=EY9uFMcgEoV8ot9h43HjByz1PyeTIr4aZBm9epwqV3m3/5XWSaKI2aktU/YH5B4qtI
\r
38 0fKjKp1nf1BKN1giUbUkczjSVVbyHCkQ0wSCiz0nZv9IVpLWtoDf5tf/ENTJbcLux/Do
\r
39 adAFfnG8XEL8sdvuc1KJhK4kOahDID7artTIk=
\r
41 Received: by 10.229.35.1 with SMTP id n1mr1216915qcd.84.1306464104246; Thu, 26
\r
42 May 2011 19:41:44 -0700 (PDT)
\r
43 Sender: amdragon@gmail.com
\r
44 Received: by 10.229.188.68 with HTTP; Thu, 26 May 2011 19:41:44 -0700 (PDT)
\r
45 In-Reply-To: <1306446621-sup-3184@brick>
\r
46 References: <1306397849-sup-3304@brick> <877h9d9y5m.fsf@yoom.home.cworth.org>
\r
47 <BANLkTi=3mQYJft4s9jGaoqSbcJvqhmZXyQ@mail.gmail.com>
\r
48 <1306442683-sup-9315@brick> <20110526214302.GR29861@mit.edu>
\r
49 <1306446621-sup-3184@brick>
\r
50 Date: Thu, 26 May 2011 22:41:44 -0400
\r
51 X-Google-Sender-Auth: lK5b05ERd-W7pUMIBxQrGZ3ex5Q
\r
52 Message-ID: <BANLkTi=Uk+bNB8sCZLVb86q-Kjfx1udEZA@mail.gmail.com>
\r
53 Subject: Re: one-time-iterators
\r
54 From: Austin Clements <amdragon@mit.edu>
\r
55 To: Patrick Totzke <patricktotzke@googlemail.com>
\r
56 Content-Type: text/plain; charset=ISO-8859-1
\r
57 Content-Transfer-Encoding: quoted-printable
\r
58 Cc: notmuch@notmuchmail.org
\r
59 X-BeenThere: notmuch@notmuchmail.org
\r
60 X-Mailman-Version: 2.1.13
\r
62 List-Id: "Use and development of the notmuch mail system."
\r
63 <notmuch.notmuchmail.org>
\r
64 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
65 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
66 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
67 List-Post: <mailto:notmuch@notmuchmail.org>
\r
68 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
69 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
70 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
71 X-List-Received-Date: Fri, 27 May 2011 02:41:47 -0000
\r
73 On Thu, May 26, 2011 at 6:22 PM, Patrick Totzke
\r
74 <patricktotzke@googlemail.com> wrote:
\r
75 > Excerpts from Austin Clements's message of Thu May 26 22:43:02 +0100 2011=
\r
77 >> > > Though, Patrick, that solution doesn't address your problem.=A0 On t=
\r
79 >> > > other hand, it's not clear to me what concurrent access semantics
\r
80 >> > > you're actually expecting.=A0 I suspect you don't want the remaining
\r
81 >> > > iteration to reflect the changes, since your changes could equally
\r
82 >> > > well have affected earlier iteration results.
\r
84 >> > > But if you want a
\r
85 >> > > consistent view of your query results, something's going to have to
\r
86 >> > > materialize that iterator, and it might as well be you (or Xapian
\r
87 >> > > would need more sophisticated concurrency control than it has).=A0 B=
\r
89 >> > > this shouldn't be expensive because all you need to materialize are
\r
90 >> > > the document ids; you shouldn't need to eagerly fetch the per-thread
\r
92 >> > I thought so, but it seems that Query.search_threads() already
\r
93 >> > caches more than the id of each item. Which is as expected
\r
94 >> > because it is designed to return thread objects, not their ids.
\r
95 >> > As you can see above, this _is_ too expensive for me.
\r
97 >> I'd forgotten that constructing threads on the C side was eager about
\r
98 >> the thread tags, author list and subject (which, without Istvan's
\r
99 >> proposed patch, even requires opening and parsing the message file).
\r
100 >> This is probably what's killing you.
\r
102 >> Out of curiosity, what is your situation that you won't wind up paying
\r
103 >> the cost of this iteration one way or the other and that the latency
\r
104 >> of doing these tag changes matters?
\r
106 > I'm trying to implement a terminal interface for notmuch in python
\r
107 > that resembles sup.
\r
108 > For the search results view, i read an initial portion from a Threads ite=
\r
110 > to fill my teminal window with threadline-widgets. Obviously, for a
\r
111 > large number of results I don't want to go through all of them.
\r
112 > The problem arises if you toggle a tag on the selected threadline and aft=
\r
114 > continue to scroll down.
\r
116 Ah, that makes sense.
\r
118 >> > > Have you tried simply calling list() on your thread
\r
119 >> > > iterator to see how expensive it is? =A0My bet is that it's quite ch=
\r
121 >> > > both memory-wise and CPU-wise.
\r
123 >> > =A0q=3DDatabase().create_query('*')
\r
124 >> > =A0time tlist =3D list(q.search_threads())
\r
125 >> > raises a NotmuchError(STATUS.NOT_INITIALIZED) exception. For some reas=
\r
127 >> > the list constructor must read mere than once from the iterator.
\r
128 >> > So this is not an option, but even if it worked, it would show
\r
129 >> > the same behaviour as my above test..
\r
131 >> Interesting. =A0Looks like the Threads class implements __len__ and that
\r
132 >> its implementation exhausts the iterator. =A0Which isn't a great idea in
\r
133 >> itself, but it turns out that Python's implementation of list() calls
\r
134 >> __len__ if it's available (presumably to pre-size the list) before
\r
135 >> iterating over the object, so it exhausts the iterator before even
\r
138 >> That said, if list(q.search_threads()) did work, it wouldn't give you
\r
139 >> better performance than your experiment above.
\r
141 >> > would it be very hard to implement a Query.search_thread_ids() ?
\r
142 >> > This name is a bit off because it had to be done on a lower level.
\r
144 >> Lazily fetching the thread metadata on the C side would probably
\r
145 >> address your problem automatically. =A0But what are you doing that
\r
146 >> doesn't require any information about the threads you're manipulating?
\r
147 > Agreed. Unfortunately, there seems to be no way to get a list of thread
\r
148 > ids or a reliable iterator thereof by using the current python bindings.
\r
149 > It would be enough for me to have the ids because then I could
\r
150 > search for the few threads I actually need individually on demand.
\r
152 There's no way to do that from the C API either, so don't feel left
\r
153 out. ]:--8) It seems to me that the right solution to your problem
\r
154 is to make thread information lazy (effectively, everything gathered
\r
155 in lib/thread.cc:_thread_add_message). Then you could probably
\r
156 materialize that iterator cheaply. In fact, it's probably worth
\r
157 trying a hack where you put dummy information in the thread object
\r
158 from _thread_add_message and see how long it takes just to walk the
\r
159 iterator (unfortunately I don't think profiling will help much here
\r
160 because much of your time is probably spent waiting for I/O).
\r
162 I don't think there would be any downside to doing this for eager
\r
163 consumers like the CLI.
\r