Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 1EE5E41A549 for ; Tue, 7 Dec 2010 17:20:24 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.99 X-Spam-Level: X-Spam-Status: No, score=-0.99 tagged_above=-999 required=5 tests=[ALL_TRUSTED=-1, T_MIME_NO_TEXT=0.01] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id G8WqHwg4DOoP; Tue, 7 Dec 2010 17:20:23 -0800 (PST) Received: from yoom.home.cworth.org (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 6506D431FB5; Tue, 7 Dec 2010 17:20:23 -0800 (PST) Received: by yoom.home.cworth.org (Postfix, from userid 1000) id 176F02540E7; Tue, 7 Dec 2010 17:20:23 -0800 (PST) From: Carl Worth To: Austin Clements , notmuch@notmuchmail.org Subject: Re: [PATCH 3/4] Optimize thread search using matched docid sets. In-Reply-To: <20101118073828.GD2439@mit.edu> References: <20101117192826.GU2439@mit.edu> <20101118073828.GD2439@mit.edu> User-Agent: Notmuch/0.5 (http://notmuchmail.org) Emacs/23.2.1 (i486-pc-linux-gnu) Date: Tue, 07 Dec 2010 17:20:22 -0800 Message-ID: <8739q95acp.fsf@yoom.home.cworth.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Dec 2010 01:20:24 -0000 --=-=-= Content-Transfer-Encoding: quoted-printable On Thu, 18 Nov 2010 02:38:29 -0500, Austin Clements wrot= e: > Currently this code uses a bitmap indexed by docid as a simple, fast > set structure. This is quite memory-efficient if the docid space is > dense, even if the largest docid is quite large. Is there a danger > that the docid space will be large and sparse? Is it worth replacing > this with a smarter bit set structure? When simply adding messages, the docid space is optimally dense, (see _notmuch_database_generate_doc_id which generates sequential docid values). As messages are removed, the space will become less dense as there is currently never any reuse of docid values. We could imagine adding some sort of packing operation to get back to dense packing, (but forcing that kind of thing on the user isn't so kind, of course). Meanwhile, Xapian can very efficiently tell us how packed the space is, (since we can query document count and the last docid allocated). So we definitely have the ability to tune things automatically if needed. We're currently using GHashTable in several places, but I've thought of incorporating a nice, little hash-table implementation that Eric Anholt wrote some time ago. If we did that, we could intelligently choose whichever data structure is more memory-efficient depending on how packed the docid space is. Personally, though, I'm not that much of a micro-optimizer. As can be seen in the current thread, I generally leave the optimization to others. Thanks again, Austin! =2DCarl =2D-=20 carl.d.worth@intel.com --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iD8DBQFM/t1W6JDdNq8qSWgRAlgPAJ0b4Zv0IZ56UlbnwCcJW9r+oJbLKwCgp4bF iiF7G8qs3IhtYeYdIwDPXA0= =8kfY -----END PGP SIGNATURE----- --=-=-=--