Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 37FD7431FBC for ; Sun, 10 Jan 2010 09:43:40 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4Dmbhh1dmmsh for ; Sun, 10 Jan 2010 09:43:39 -0800 (PST) Received: from yoom.home.cworth.org (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 541D3431FAE for ; Sun, 10 Jan 2010 09:43:39 -0800 (PST) Received: by yoom.home.cworth.org (Postfix, from userid 1000) id 0BFCB25400A; Sun, 10 Jan 2010 09:43:39 -0800 (PST) From: Carl Worth To: notmuch@notmuchmail.org Date: Sun, 10 Jan 2010 09:43:38 -0800 Message-ID: <87ocl1lut1.fsf@yoom.home.cworth.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" Subject: [notmuch] Some Xapian tips and thoughts on rebuilding X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Jan 2010 17:43:40 -0000 --=-=-= Content-Transfer-Encoding: quoted-printable With the recent change to "database format 1" some users might decide to rebuild their notmuch database. If so, there are some things I've learned about Xapian that are good to know before you rebuild. Or maybe what you read below will encourage you to rebuild your notmuch database. I think all users of notmuch have been discouraged by how slow it is to change the tags on messages. Many of you have heard of "Xapian defect #250" that was causing some slowness here. I'm happy to report that with initial code from Kan-Ru Chen, Richard Boulton has recently committed a fix for this bug to Xapian upstream, (after rewriting the fix substantially, extending the fix to multiple backends, and writing several new Xapian test cases for it). However, just upgrading your Xapian library won't necessarily give you any benefit with notmuch. But you can be assured of getting some benefit if you upgrade both Xapian and notmuch and rebuild your notmuch database. The gory details are covered below. Gory details for getting the Xapian #250 fix benefit with flint =2D-------------------------------------------------------------- Xapian has a notion of multiple backends which store the data in the database differently. In the 1.0 versions of Xapian, the default backend is the "flint" backend. This backend stores the document "length" in every "posting" entry, (where a posting is effectively a link from a particular "term" to a particular "document" perhaps with positional information). The fix for defect #250 is to update as little as possible when we add or remove a single term (and hence a posting) to a document. But if this change also changes the document length, then all postings will unavoidably need to be updated. Historically, notmuch hasn't taken any special care with the results on "document length" when adding terms for things like tags. The default treatment is that terms *do* affect document length. But for terms like tags that don't actually occur in the document content, it makes sense to record them as having 0 effect on the document length. I recently fixed notmuch to do so. But you'll have to rebuild your notmuch database with a recent notmuch in order to get that change. But if you rebuild, you might want to use chert instead of flint =2D--------------------------------------------------------------- I mentioned that "flint" is the default backend in the 1.0 releases of Xapian. In the development versions that you can checkout from the project's svn repository, there's support for a newer backend named "chert", (expected to be the default in an upcoming release). To get Xapian to use chert you need to have the following environment variable set when doing the initial "notmuch new" to build your database: XAPIAN_PREFER_CHERT=3D1 After that, Xapian will see that your database is chert and will know how to deal with it. (Except that I have seen that upgrading Xapian From=20one svn version to another may result in incompatible changes to the chert format---so a future Xapian may not be able to read a previously-created chert database. I assume these format changes won't happen in stable releases of Xapian.) One thing that's nice about chert compared to flint is that it no longer stores the document length in every posting. This means it's easier to get the benefit from the Xapian defect #250 fix. It also means that your database can be much smaller. For my notmuch database, a flint built is about 7.0GB while a chert build is only 5.0GB---a very nice change. Compacting your database =2D----------------------- One final tip. I recently started experimenting with a Xapian feature for compacting a database. This is available only via a command-line program, (named xapian-compact in the 1.0 releases and xapian-compact-1.1 in the current Xapian from svn). This functionality is not yet available in the Xapian library interface or else I would probably make notmuch call it after building the database. If you want to experiment with xapian-compact, you'll want to call it with a command something like the following: xapian-compact-1.1 --no-renumber ~/mail/.notmuch/xapian ~/mail/.notmuc= h/xapian-compact The --no-renumber argument is essential with a notmuch database, since (as of database format version 1), notmuch stores Xapian document IDs internally within terms. If you forget this, you'll find that all of your searches will return results that are unable to locate any of the filenames corresponding to your mail. After running the above command, you could then move your existing .notmuch/xapian away and move .notmuch/xapian-compact in its place to test, and then discard the original .notmuch/xapian if you're happy with the result. For me, this compaction took my 5.0GB down to 3.1GB. So my database is now less than half the size of what I started with with flint, (and can conceivable be cached entirely within memory on my machine!), which is quite delightful. I hope the above is helpful, (and yes, clearly we need to get this content out in other ways such as in a README in the source distribution, and on the website in some form much better than our current pipermail-based mailing-list archives). =2DCarl --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iD8DBQFLShHK6JDdNq8qSWgRAnnQAJ4ubKP5ePBKqhFnHoyAYw8IlVxG9wCfaBee rRAqf58rk/uD80uGu3Fyg3w= =PsG1 -----END PGP SIGNATURE----- --=-=-=--