Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 2A54D431FC0; Thu, 26 Nov 2009 11:16:36 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uJd6VDxNokqE; Thu, 26 Nov 2009 11:16:35 -0800 (PST) Received: from cworth.org (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 72CD9431FAE; Thu, 26 Nov 2009 11:16:35 -0800 (PST) From: Carl Worth To: Dominik Epple In-Reply-To: <87y6ltqg2p.fsf@yoom.home.cworth.org> References: <123554aa0911200056h73def158pb0db64a2a78ed687@mail.gmail.com> <87skc8oqyn.fsf@yoom.home.cworth.org> <123554aa0911230826o11e54d5ckc90e5ae8dab6ffd3@mail.gmail.com> <123554aa0911250139l907c4efs60d704dae962c473@mail.gmail.com> <87y6ltqg2p.fsf@yoom.home.cworth.org> Date: Thu, 26 Nov 2009 11:16:21 -0800 Message-ID: <87vdgxqepm.fsf@yoom.home.cworth.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: notmuch@notmuchmail.org Subject: Re: [notmuch] notmuch new: Memory problem X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Nov 2009 19:16:36 -0000 On Thu, 26 Nov 2009 10:46:54 -0800, Carl Worth wrote: > So perhaps the new configuration option we want is a limit on message > size? Rather than ignoring large files entirely, notmuch could just stop > indexing messages past the configured limit? Having just written that, I don't think it's actually an interesting option. Instead of working around the bug, we should just find out what the bug actually is. It could be that Xapian's TermGenerator is just going nuts here. Or it could be that Xapian is just trying to hold too much data in memory instead of flushing it out to disk. Currently, notmuch doesn't ever call any explicit Xapian flush. Instead, we rely on the default behavior which is that Xapian will flush to disk after every batch of 10000 documents added. So it's possible that all that's actually needed here is for notmuch to notice that it just indexed a huge file, and then explicitly flush to avoid Xapian using too much memory. Or, perhaps better, Xapian could be fixed to automatically flush if its memory usages gets "too big", (if the missing flush is actually what's needed here). Clearly, some experimenting is needed. Dominik, if you can share the large file, (with either me alone or with the whole list), a pointer to where we could download it would be appreciated. -Carl