Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 1FDA8429E21 for ; Mon, 28 Nov 2011 13:04:30 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_NONE=-0.0001] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HfyYiCJX08OE for ; Mon, 28 Nov 2011 13:04:29 -0800 (PST) Received: from smtprelay02.ispgateway.de (smtprelay02.ispgateway.de [80.67.31.25]) by olra.theworths.org (Postfix) with ESMTP id 01A85431FB6 for ; Mon, 28 Nov 2011 13:04:28 -0800 (PST) Received: from [87.180.30.217] (helo=stokes.schwinge.homeip.net) by smtprelay02.ispgateway.de with esmtpa (Exim 4.68) (envelope-from ) id 1RV8NH-0006DS-Nw for notmuch@notmuchmail.org; Mon, 28 Nov 2011 22:04:27 +0100 Received: (qmail 3559 invoked from network); 28 Nov 2011 21:04:17 -0000 Received: from boole.schwinge.homeip.net (192.168.111.208) by stokes.schwinge.homeip.net with QMQP; 28 Nov 2011 21:04:17 -0000 Received: (nullmailer pid 7842 invoked by uid 1000); Mon, 28 Nov 2011 21:04:17 -0000 From: Thomas Schwinge To: notmuch@notmuchmail.org Subject: Re: [PATCH] dump: Don't sort. In-Reply-To: <2flr514171q.fsf@login1.uio.no> References: <1319884657-5574-1-git-send-email-thomas@schwinge.name> <2flr514171q.fsf@login1.uio.no> User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Mon, 28 Nov 2011 22:04:14 +0100 Message-ID: <87hb1ormb5.fsf@boole.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" X-Df-Sender: dGhvbWFzQHNjaHdpbmdlLm5hbWU= Cc: Petter Reinholdtsen X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Nov 2011 21:04:30 -0000 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi! First, thanks to David, Tomi, Tom for moving this forward. On Sat, 19 Nov 2011 16:11:13 +0100, Petter Reinholdtsen w= rote: > [Thomas Schwinge] > > + /* This used to use NOTMUCH_SORT_MESSAGE_ID. On 2011-10-29, a mea= surement > > + * on a 372981 messages instance showed that wall time can be redu= ced from > > + * 28 minutes (sorted by Message-ID) to 15 minutes (unsorted), the= latter > > + * being much more ``database-disk-layout-friendly''. Subsequentl= y sorting > > + * the 25 MiB of data is a no-brainer, if required. */ Here is the measurement re-done -- I discovered that while doing the former, there had been parallel work been done in another Xen domU on that system, disturbing the measurement. Discard caches, every time before dumping: $ sync; sleep 3; echo -n 3 | sudo dd of=3D/proc/sys/vm/drop_caches Original (sorted by Message-ID): $ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump 26.41user 16.56system 14:34.81elapsed 4%CPU (0avgtext+0avgdata 167152ma= xresident)k 2994440inputs+55896outputs (41major+11627minor)pagefaults 0swaps Unsorted: $ \time notmuch dump | sort > ~/tmp/Mail-notmuch_dump/dump 24.79user 3.86system 12:00.22elapsed 3%CPU (0avgtext+0avgdata 57216maxr= esident)k 2929192inputs+0outputs (40major+4942minor)pagefaults 0swaps The difference is no longer as big as before, but still better than nothing. > This sound like a great idea for my use case. Doing 'notmuch dump' > with my 1.2 million emails take hours at the moment (not very fast > encrypted file system), and result in a 90 MiB dump file. ... and you will gain most by putting the .notmuch directory onto a SSD, as I have done by now: Original (sorted by Message-ID), with .notmuch on SSD: $ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump 24.86user 13.40system 1:06.01elapsed 57%CPU (0avgtext+0avgdata 167200ma= xresident)k 2992184inputs+55920outputs (49major+11622minor)pagefaults 0swaps Unsorted, with .notmuch on SSD: $ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump 21.90user 2.68system 0:51.70elapsed 47%CPU (0avgtext+0avgdata 57248maxr= esident)k 2926912inputs+55920outputs (50major+4934minor)pagefaults 0swaps User and system time (roughly) remain the same, but the wall time drops considerably -- a SSD at its best, obviously. Generally speaking, I decided it was enough to just put the .notmuch directory onto the SSD, and not the whole mail store: if new messages are added (notmuch new), they're still in the page cache anyway (having been retrieven via POP3 or whatever just before), and for regular message read access, a HDD's seek time shouldn't matter too much (and I've taken notice of Austin's patches which even retrieven Subject: etc. from the DB), so what remains to be optimized is random access to the DB. Gr=C3=BC=C3=9Fe, Thomas --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJO0/dPAAoJEGe3hdm9kOiimzQH/joxsLmsJ5vpFZPVjzHC0geP JTVrHA2OBlch591kL1UfLeyK6XAMokcx5+UC6oeV6cQkdoOn3gHwt1C5A3qzCyRg UgYfWMavHuUHoXbV/+rO5PU9p+6+w3ij0XR/E8zSSzP3BQgOH9XPF0roexBCkBN5 JwoLhnFbQeotgSPBwbcs+XxgSeGk6gMKOV0rCC1vEJdWcxFpBtrE7MVJo/A3syLI j6uPjJObbPk1DTo2UOUcU1Mb7gdwIlrJdvu4O1y5qiFsdxF3ZtQEJrt1OB9eo4TJ e5Rhm6xI2iaexVGR6lNdLhsBiuQlWG1ADjCdABCVJCYA8Yf8pynmP365Eu1Ovs4= =VBNs -----END PGP SIGNATURE----- --=-=-=--