Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 1D13C40DEFB for ; Wed, 17 Nov 2010 23:38:42 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -1.9 X-Spam-Level: X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BRXgZfIOyN53 for ; Wed, 17 Nov 2010 23:38:32 -0800 (PST) Received: from dmz-mailsec-scanner-8.mit.edu (DMZ-MAILSEC-SCANNER-8.MIT.EDU [18.7.68.37]) by olra.theworths.org (Postfix) with ESMTP id F1CD540DEF6 for ; Wed, 17 Nov 2010 23:38:31 -0800 (PST) X-AuditID: 12074425-b7c98ae000000a04-14-4ce4d7f7a7e5 Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) by dmz-mailsec-scanner-8.mit.edu (Symantec Brightmail Gateway) with SMTP id 09.B7.02564.7F7D4EC4; Thu, 18 Nov 2010 02:38:31 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id oAI7cUnf002393 for ; Thu, 18 Nov 2010 02:38:31 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id oAI7cTEI022928 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 18 Nov 2010 02:38:30 -0500 (EST) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.72) (envelope-from ) id 1PIz4f-0002Zr-1F for notmuch@notmuchmail.org; Thu, 18 Nov 2010 02:38:29 -0500 Date: Thu, 18 Nov 2010 02:38:29 -0500 From: Austin Clements To: notmuch@notmuchmail.org Subject: Re: [PATCH 3/4] Optimize thread search using matched docid sets. Message-ID: <20101118073828.GD2439@mit.edu> References: <20101117192826.GU2439@mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20101117192826.GU2439@mit.edu> User-Agent: Mutt/1.5.20 (2009-06-14) X-Brightmail-Tracker: AAAAAA== X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Nov 2010 07:38:42 -0000 Currently this code uses a bitmap indexed by docid as a simple, fast set structure. This is quite memory-efficient if the docid space is dense, even if the largest docid is quite large. Is there a danger that the docid space will be large and sparse? Is it worth replacing this with a smarter bit set structure? Quoth myself on Nov 17 at 2:28 pm: > This reduces thread search's 1+2t Xapian queries (where t is the > number of matched threads) to 1+t queries and constructs exactly one > notmuch_message_t for each message instead of 2 to 3. > notmuch_query_search_threads eagerly fetches the docids of all > messages matching the user query instead of lazily constructing > message objects and fetching thread ID's from term lists. > _notmuch_thread_create takes a seed docid and the set of all matched > docids and uses a single Xapian query to expand this docid to its > containing thread, using the matched docid set to determine which > messages in the thread match the user query instead of using a second > Xapian query. > > As a side effect, this fixes author order so authors are always sorted > by first occurrence in each thread. This breaks two emacs tests that > hard-code the old, buggy author order. > > This reduces the amount of time required to load my inbox from 4.523 > seconds to 3.025 seconds (1.5X faster).