Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 5BB9D431FBF; Sat, 21 Nov 2009 09:07:23 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BRy2DldsBgM4; Sat, 21 Nov 2009 09:07:22 -0800 (PST) Received: from cworth.org (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id D5801431FAE; Sat, 21 Nov 2009 09:07:21 -0800 (PST) From: Carl Worth To: Stefan Schmidt , notmuch@notmuchmail.org In-Reply-To: <20091121145111.GB19397@excalibur.local> References: <20091121145111.GB19397@excalibur.local> Date: Sat, 21 Nov 2009 18:07:10 +0100 Message-ID: <87fx874xj5.fsf@yoom.home.cworth.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: Re: [notmuch] 25 minutes load time with emacs -f notmuch X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Nov 2009 17:07:23 -0000 On Sat, 21 Nov 2009 15:51:11 +0100, Stefan Schmidt wrote: > Disclaimer: I'm using vim, in combination with mutt for email, for years, but > never dealt with emacs. Please have this in mind and spot any emacs user errors > in this report. :) Hi Stefan, welcome to Notmuch! And don't worry, we don't discriminate (too much) against non-emacs users around here. > I have first seen notmuch several weeks ago as it seems a silent project. Being > more then happy now that it envolves quickly and a real developer community > builds around it. Yes. Notmuch was a silent project since it was just something that I was doing for myself. I was always writing it as free software, and even had a public git repository available, but hadn't advertised it at all yet. And Keith did rather catch me off guard by announcing it. But I can't complain as we have gotten a nice community started already, and it's great to have other people writing the code that I intended to write. :-) But it's also true that some obvious problems just aren't taken care of yet. > But now to my problem. Getting m mail indexed was easy enough: > > stefan@excalibur:~$ du -chs not-much-mail/ > 1.5G not-much-mail/ > 1.5G total > stefan@excalibur:~$ time notmuch new > Found 103677 total files. > Processed 103677 total files in 42m 30s (40 files/sec.). > Added 100899 new messages to the database (not much, really). Good. I'm glad that went fairly smoothly for you. Though, frankly, I think we need to fix "notmuch new" to do much better than 40 files/sec. One plan I have for this is to not use the database to search for message IDs when adding many messages---but to instead just use a hash-table (seeded from any messages already in the database). This would allow us to do all thread resolution before indexing messages, without having to do the N different searches, and also means we'd avoid continually rewriting documents when merging thread IDs. > I put (require 'notmuch) in my ~/.emacs ans start emacs with the -f notmuch > option to enter the notmuch mode. I'm glad you've figured that much out. I feel bad that that's not even in the documentation anywhere yet. > What happends then is that a notmuch process gets started and emacs > waits for the return. OK. This is a known shortcoming. As Bdale supposes, this problem is from notmuch trying to load and construct every thread in your database. There are actually several different bugs/missing features here that should be addressed: * "notmuch new" should look at the R flag in maildir files to determine that they are read and do not need to be marked as "inbox" and "unread" * "notmuch setup" should prompt for some date range, ("last 2 months" by default?) before which no messages will be considered unread. Either of those two fixes would have prevented your particular problem. But it's still easy to generate searches that return large numbers of results. So there's some more to do: * The emacs code needs to call "notmuch search" with the --first and --max-threads options to get a limited set of results, (one or two screenfuls). You should be able to test this at the command line and see that it returns results quickly. Then, of course, we'd like the emacs code to fill in subsequent screenfuls as you page. But none of that helps you right now. What you need is to retroactively remove all of the "inbox" and "unread" tags from messages older than some time period. So then there's another missing feature: * We need to support date-range-based searches. If we had that you could just do: notmuch tag -inbox -unread until:"2 months ago" But we don't quite have this yet. Xapian does have support for a slightly less convenient date range specification: 1970-01-01..2009-09-21 but it turns out that we can't even use that just yet, since to make that work we would have to have dates saved as YYYYMMDD strings for each message, (where instead we have time_t values stored serialized into a string that will sort correctly.). So we need a new ValueRangeProcessor class to map to timestamps, and then we'll need some fancy parsing to do things like "2 months ago". So, what's the best thing to do today if you want to start playing with notmuch? I think you could pick one of the above to work on, (a quick hack to "notmuch new" and a re-import might do the trick). Or you might just remove the inbox and unread tags from all messages and then just let messages that are actually *new* in the future get tagged into the inbox by "notmuch new". Oh, but then there's another missing feature: * We need a syntax to specify a search string that should match all messages. Then you could do: notmuch tag -inbox -unread Yikes! So many bugs and missing features. How is anyone actually using this system? Well, Keith and I were able to get past all this by simply doing a "notmuch restore" based on tags we got from sup-dump. So here, is another attempt: 1. Run "notmuch dump " to get the list of message IDs, (all with their "inbox" and "unread" tags). 2. Edit that file to remove the tags you want. 3. Run "notmuch restore " to cause the tags to be removed. But, (*sigh*), that's not good either, because "notmuch dump" is currently hard-coded to dump messages in message-ID order rather than date order, (so you can't easily do something like "just remove the tags from messages older than two months). So, there's sadly no easy way to get what you want with the tools in their current form. I guess that's the pain that you get for being an early adopter. :-} But if hacking a little C code doesn't scare you away, a lot of the things listed above are actually really easy to fix. (Like, fixing "notmuch dump" to just run in date order is a one-line change. Adding a --sort command-line option to it wouldn't be much harder, etc.) So hopefully the above serves as a nice TODO list. Thanks everyone for your interest in this software even in its current, can-be-painful-to-use state. -Carl PS. Expect the mass-re-tag operations to be about as slow as the original "notmuch new" import of the messages. That's a known bug in Xapian that's one of the highest priority things that I'd like to fix, (along with all of the above and all the other things I want to do...) At least we're not running out of things to work on here.