Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 142D2429E28 for ; Thu, 26 May 2011 19:41:47 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.699 X-Spam-Level: X-Spam-Status: No, score=-0.699 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dxVihK4TAvu5 for ; Thu, 26 May 2011 19:41:45 -0700 (PDT) Received: from mail-qw0-f53.google.com (mail-qw0-f53.google.com [209.85.216.53]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 0B09B431FB6 for ; Thu, 26 May 2011 19:41:44 -0700 (PDT) Received: by qwb7 with SMTP id 7so872055qwb.26 for ; Thu, 26 May 2011 19:41:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=phq5UPcAoJLmMxW0VGIZtZFLGrie0BSNhwo4E6f7q+Q=; b=SxZfUTmEs9wMz62AJsxwe6OCrvMYzDXGcf7rYhjcUvZyVDQlwr1aipim1xxk12Dyvb bH93yQ3dmy2zIfv4K7VRgl9b2Xy4NmEfSAQeAewFoq/6+fdcxdtO+Iu/69G2WrISLzkj EshCH6bV5Dwh1/Lhy05DTHmjtn0WscbB7eagg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=EY9uFMcgEoV8ot9h43HjByz1PyeTIr4aZBm9epwqV3m3/5XWSaKI2aktU/YH5B4qtI 0fKjKp1nf1BKN1giUbUkczjSVVbyHCkQ0wSCiz0nZv9IVpLWtoDf5tf/ENTJbcLux/Do adAFfnG8XEL8sdvuc1KJhK4kOahDID7artTIk= MIME-Version: 1.0 Received: by 10.229.35.1 with SMTP id n1mr1216915qcd.84.1306464104246; Thu, 26 May 2011 19:41:44 -0700 (PDT) Sender: amdragon@gmail.com Received: by 10.229.188.68 with HTTP; Thu, 26 May 2011 19:41:44 -0700 (PDT) In-Reply-To: <1306446621-sup-3184@brick> References: <1306397849-sup-3304@brick> <877h9d9y5m.fsf@yoom.home.cworth.org> <1306442683-sup-9315@brick> <20110526214302.GR29861@mit.edu> <1306446621-sup-3184@brick> Date: Thu, 26 May 2011 22:41:44 -0400 X-Google-Sender-Auth: lK5b05ERd-W7pUMIBxQrGZ3ex5Q Message-ID: Subject: Re: one-time-iterators From: Austin Clements To: Patrick Totzke Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 May 2011 02:41:47 -0000 On Thu, May 26, 2011 at 6:22 PM, Patrick Totzke wrote: > Excerpts from Austin Clements's message of Thu May 26 22:43:02 +0100 2011= : >> > > Though, Patrick, that solution doesn't address your problem.=A0 On t= he >> > > other hand, it's not clear to me what concurrent access semantics >> > > you're actually expecting.=A0 I suspect you don't want the remaining >> > > iteration to reflect the changes, since your changes could equally >> > > well have affected earlier iteration results. >> > That's right. >> > > But if you want a >> > > consistent view of your query results, something's going to have to >> > > materialize that iterator, and it might as well be you (or Xapian >> > > would need more sophisticated concurrency control than it has).=A0 B= ut >> > > this shouldn't be expensive because all you need to materialize are >> > > the document ids; you shouldn't need to eagerly fetch the per-thread >> > > information. >> > I thought so, but it seems that Query.search_threads() already >> > caches more than the id of each item. Which is as expected >> > because it is designed to return thread objects, not their ids. >> > As you can see above, this _is_ too expensive for me. >> >> I'd forgotten that constructing threads on the C side was eager about >> the thread tags, author list and subject (which, without Istvan's >> proposed patch, even requires opening and parsing the message file). >> This is probably what's killing you. >> >> Out of curiosity, what is your situation that you won't wind up paying >> the cost of this iteration one way or the other and that the latency >> of doing these tag changes matters? > > I'm trying to implement a terminal interface for notmuch in python > that resembles sup. > For the search results view, i read an initial portion from a Threads ite= rator > to fill my teminal window with threadline-widgets. Obviously, for a > large number of results I don't want to go through all of them. > The problem arises if you toggle a tag on the selected threadline and aft= erwards > continue to scroll down. Ah, that makes sense. >> > > Have you tried simply calling list() on your thread >> > > iterator to see how expensive it is? =A0My bet is that it's quite ch= eap, >> > > both memory-wise and CPU-wise. >> > Funny thing: >> > =A0q=3DDatabase().create_query('*') >> > =A0time tlist =3D list(q.search_threads()) >> > raises a NotmuchError(STATUS.NOT_INITIALIZED) exception. For some reas= on >> > the list constructor must read mere than once from the iterator. >> > So this is not an option, but even if it worked, it would show >> > the same behaviour as my above test.. >> >> Interesting. =A0Looks like the Threads class implements __len__ and that >> its implementation exhausts the iterator. =A0Which isn't a great idea in >> itself, but it turns out that Python's implementation of list() calls >> __len__ if it's available (presumably to pre-size the list) before >> iterating over the object, so it exhausts the iterator before even >> using it. >> >> That said, if list(q.search_threads()) did work, it wouldn't give you >> better performance than your experiment above. >> >> > would it be very hard to implement a Query.search_thread_ids() ? >> > This name is a bit off because it had to be done on a lower level. >> >> Lazily fetching the thread metadata on the C side would probably >> address your problem automatically. =A0But what are you doing that >> doesn't require any information about the threads you're manipulating? > Agreed. Unfortunately, there seems to be no way to get a list of thread > ids or a reliable iterator thereof by using the current python bindings. > It would be enough for me to have the ids because then I could > search for the few threads I actually need individually on demand. There's no way to do that from the C API either, so don't feel left out. ]:--8) It seems to me that the right solution to your problem is to make thread information lazy (effectively, everything gathered in lib/thread.cc:_thread_add_message). Then you could probably materialize that iterator cheaply. In fact, it's probably worth trying a hack where you put dummy information in the thread object from _thread_add_message and see how long it takes just to walk the iterator (unfortunately I don't think profiling will help much here because much of your time is probably spent waiting for I/O). I don't think there would be any downside to doing this for eager consumers like the CLI.