From d1049cfb41ce6bf5cd7fb9e9e9f6cb393eedb567 Mon Sep 17 00:00:00 2001 From: Ben Gamari Date: Wed, 4 Sep 2013 10:32:20 +2000 Subject: [PATCH] Re: [PATCH 1/3] database: Add notmuch_database_compact_close --- 66/d3c55f1932596e9b7cf55e1a2c50d0d8964e29 | 316 ++++++++++++++++++++++ 1 file changed, 316 insertions(+) create mode 100644 66/d3c55f1932596e9b7cf55e1a2c50d0d8964e29 diff --git a/66/d3c55f1932596e9b7cf55e1a2c50d0d8964e29 b/66/d3c55f1932596e9b7cf55e1a2c50d0d8964e29 new file mode 100644 index 000000000..c61baa679 --- /dev/null +++ b/66/d3c55f1932596e9b7cf55e1a2c50d0d8964e29 @@ -0,0 +1,316 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 1B438431FCF + for ; Tue, 3 Sep 2013 07:33:05 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -0.799 +X-Spam-Level: +X-Spam-Status: No, score=-0.799 tagged_above=-999 required=5 + tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, + FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id ePfhwJl0547z for ; + Tue, 3 Sep 2013 07:32:57 -0700 (PDT) +Received: from mail-yh0-f43.google.com (mail-yh0-f43.google.com + [209.85.213.43]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id 57F36431FC9 + for ; Tue, 3 Sep 2013 07:32:57 -0700 (PDT) +Received: by mail-yh0-f43.google.com with SMTP id b6so252912yha.16 + for ; Tue, 03 Sep 2013 07:32:55 -0700 (PDT) +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; + h=from:to:subject:in-reply-to:references:user-agent:date:message-id + :mime-version:content-type; + bh=2E8XN5K7x3rTAx2m8ZbxK5k6dhSny6A5R2xEgdnF2Ec=; + b=VXGXdzGe4jrD7PNiUktk9IUIGLmefAO1ytt9BzfpWE/Zc8uYFyBJyCfTk69w+UryVo + MoBaoREJz+vqdZ3lShiADQWSUrOukIFi6jaueuMLozr9XT5LGlbAPHolaRMwvVqfzga4 + uRXSZtM04ZuBKGRU9bSIliRdPwfe515cURLpPV2Cji5tMueKA/xU4FbxfQHm1La5qZyX + s7yThVjXZQxKj1VCGjyS0XgXQnkH+m6puShcB2IGUnJDsHi8Wx1lpm8o7Yc5/juQQ41K + oU5dnUYgQavgWoeOaKg/QZduMkh+MulaA3MXpsI5A6ZBOluw/gKWnjAoCSkN/+zTQ0Ms + MJ9w== +X-Received: by 10.236.9.9 with SMTP id 9mr340884yhs.97.1378218775599; + Tue, 03 Sep 2013 07:32:55 -0700 (PDT) +Received: from localhost (pool-108-8-228-201.spfdma.east.verizon.net. + [108.8.228.201]) + by mx.google.com with ESMTPSA id g25sm22829171yhg.6.1969.12.31.16.00.00 + (version=TLSv1.2 cipher=RC4-SHA bits=128/128); + Tue, 03 Sep 2013 07:32:54 -0700 (PDT) +From: Ben Gamari +To: Jani Nikula , notmuch@notmuchmail.org +Subject: Re: [PATCH 1/3] database: Add notmuch_database_compact_close +In-Reply-To: <87ppsssg3c.fsf@nikula.org> +References: <1377312923-32274-1-git-send-email-bgamari.foss@gmail.com> + <1377312923-32274-2-git-send-email-bgamari.foss@gmail.com> + <87ppsssg3c.fsf@nikula.org> +User-Agent: Notmuch/0.16+15~g17b9eaa (http://notmuchmail.org) Emacs/24.2.1 + (x86_64-pc-linux-gnu) +Date: Tue, 03 Sep 2013 10:32:20 -0400 +Message-ID: <87k3iyyo0b.fsf@gmail.com> +MIME-Version: 1.0 +Content-Type: multipart/signed; boundary="=-=-="; + micalg=pgp-sha1; protocol="application/pgp-signature" +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Tue, 03 Sep 2013 14:33:05 -0000 + +--=-=-= +Content-Type: text/plain +Content-Transfer-Encoding: quoted-printable + +Jani Nikula writes: + +> On Sat, 24 Aug 2013, Ben Gamari wrote: +>> This function uses Xapian's Compactor machinery to compact the notmuch +>> database. The compacted database is built in a temporary directory and +>> later moved into place while the original uncompacted database is +>> preserved. +>> + +snip + +>>=20=20 +>> +class NotmuchCompactor : public Xapian::Compactor +>> +{ +>> +public: +>> + virtual void +>> + set_status (const std::string &table, const std::string &status) +>> + { +>> + if (status.length() =3D=3D 0) +>> + printf ("compacting table %s:\n", table.c_str()); +>> + else +>> + printf (" %s\n", status.c_str()); +>> + } +> +> We're trying to reduce the amount of prints directly from libnotmuch, +> not increase. This applies here as well as below. +> +Fair enough. That being said, I think that status updates are fairly +important given that the compaction process can be rather long. Would +the preferred interface be to provide notmuch_database_compact_close +with a progress callback? + +>> +}; +>> + +>> +#if HAVE_XAPIAN_COMPACT +>> +notmuch_status_t +>> +notmuch_database_compact_close (notmuch_database_t *notmuch) +>> +{ +>> + void *local =3D talloc_new (NULL); +>> + NotmuchCompactor compactor; +>> + char *notmuch_path, *xapian_path, *compact_xapian_path, *old_xapian= +_path; +>> + notmuch_status_t ret =3D NOTMUCH_STATUS_SUCCESS; +>> + +>> + if (! (notmuch_path =3D talloc_asprintf (local, "%s/%s", notmuch->p= +ath, ".notmuch"))) { +>> + ret =3D NOTMUCH_STATUS_OUT_OF_MEMORY; +>> + goto DONE; +>> + } +>> + +>> + if (! (xapian_path =3D talloc_asprintf (local, "%s/%s", notmuch_pat= +h, "xapian"))) { +>> + ret =3D NOTMUCH_STATUS_OUT_OF_MEMORY; +>> + goto DONE; +>> + } +>> + +>> + if (! (compact_xapian_path =3D talloc_asprintf (local, "%s.compact"= +, xapian_path))) { +>> + ret =3D NOTMUCH_STATUS_OUT_OF_MEMORY; +>> + goto DONE; +>> + } +>> + +>> + if (! (old_xapian_path =3D talloc_asprintf (local, "%s.old", xapian= +_path))) { +>> + ret =3D NOTMUCH_STATUS_OUT_OF_MEMORY; +>> + goto DONE; +>> + } +>> + +>> + try { +>> + compactor.set_renumber(false); +>> + compactor.add_source(xapian_path); +>> + compactor.set_destdir(compact_xapian_path); +>> + compactor.compact(); +>> + +>> + if (rename(xapian_path, old_xapian_path)) { +>> + fprintf (stderr, "Error moving old database out of the way\n"); +>> + ret =3D NOTMUCH_STATUS_FILE_ERROR; +>> + goto DONE; +>> + } +> +> This fails if old_xapian_path exists. +> +Ouch, yes, you are right. I suspect the right way forward here will be +to check whether old_xapian_path exists before beginning compaction, +allowing the user to fix the situation before it fails after finishing +what might be a pretty long process. + +>> + +>> + if (rename(compact_xapian_path, xapian_path)) { +>> + fprintf (stderr, "Error moving compacted database\n"); +>> + ret =3D NOTMUCH_STATUS_FILE_ERROR; +>> + goto DONE; +>> + } +>> + } catch (Xapian::InvalidArgumentError e) { +>> + fprintf (stderr, "Error while compacting: %s", e.get_msg().c_str()); +>> + ret =3D NOTMUCH_STATUS_XAPIAN_EXCEPTION; +>> + goto DONE; +>> + } +>> + +>> + fprintf (stderr, "\n"); +>> + fprintf (stderr, "\n"); +>> + fprintf (stderr, "Old database has been moved to %s", old_xapian_pa= +th); +>> + fprintf (stderr, "\n"); +>> + fprintf (stderr, "To delete run,\n"); +>> + fprintf (stderr, "\n"); +>> + fprintf (stderr, " rm -R %s\n", old_xapian_path); +>> + fprintf (stderr, "\n"); +>> + +>> + notmuch_database_close(notmuch); +>> + +>> +DONE: +>> + talloc_free(local); +> +> The database does not get closed on errors. If that's intentional, it +> should be documented. +> +I had reasons for this but they have long fled my memory. Regardless of +what it does, this behavior should be documented. I'll take care of this. + +>> + return ret; +>> +} +>> +#else +>> +notmuch_status_t +>> +notmuch_database_compact_close (unused (notmuch_database_t *notmuch)) +>> +{ +>> + fprintf (stderr, "notmuch was compiled against a xapian version lac= +king compaction support.\n"); +>> + return NOTMUCH_STATUS_UNSUPPORTED_OPERATION; +>> +} +>> +#endif +>> + +>> void +>> notmuch_database_destroy (notmuch_database_t *notmuch) +>> { +>> diff --git a/lib/notmuch.h b/lib/notmuch.h +>> index 998a4ae..e9abd90 100644 +>> --- a/lib/notmuch.h +>> +++ b/lib/notmuch.h +>> @@ -101,6 +101,7 @@ typedef enum _notmuch_status { +>> NOTMUCH_STATUS_TAG_TOO_LONG, +>> NOTMUCH_STATUS_UNBALANCED_FREEZE_THAW, +>> NOTMUCH_STATUS_UNBALANCED_ATOMIC, +>> + NOTMUCH_STATUS_UNSUPPORTED_OPERATION, +>>=20=20 +>> NOTMUCH_STATUS_LAST_STATUS +>> } notmuch_status_t; +>> @@ -215,6 +216,20 @@ notmuch_database_open (const char *path, +>> void +>> notmuch_database_close (notmuch_database_t *database); +>>=20=20 +>> +/* Close the given notmuch database and then compact it. +> +> The implementation first compacts then closes. +> + +>> + * After notmuch_database_close_compact has been called, calls to +>> + * other functions on objects derived from this database may either +>> + * behave as if the database had not been closed (e.g., if the +>> + * required data has been cached) or may fail with a +>> + * NOTMUCH_STATUS_XAPIAN_EXCEPTION. +>> + * +>> + * notmuch_database_close_compact can be called multiple times. Later +>> + * calls have no effect. +> +> This is not true. The Xapian compactor does not require the database to +> be open. It will happily open the database read-only and compact the +> database again if database has been closed. +> +>> + */ +>> +notmuch_status_t +>> +notmuch_database_compact_close (notmuch_database_t *notmuch); +> +> I'm afraid we really need to re-think the API. +> +It seems you are right. When writing this interface it was clear that +there would be a number of opportunities for misuse. I was hoping by +combining compact and close some of these would be eliminated but +clearly this isn't enough. + +> I see that your CLI 'notmuch compact' command opens the database +> read-write, I assume to ensure there are no other writers, so that stuff +> doesn't get lost. However if you pass a read-write database that has +> been modified, I believe the changes will get lost (as Xapian opens the +> database read-only). We shouldn't let the API users shoot themselves in +> the foot so easily. +> +That is correct; the read-write database was an attempt to force the +user to exclusively lock the database they are trying to compact. It +seems that things can go quite wrong[1] when a database is modified +during compaction. There was a suggestion in that thread to add an +option to lock the database during compaction. Perhaps it might be worth +bringing this up again with Xapian upstream. I think we agree that it +would be a poor idea to merge compaction functionality without having a +mechanism for ensuring data integrity, especially since many users +invoke notmuch in a cron job. + +> I think I'd go for something like: +> +> notmuch_status_t +> notmuch_database_compact (const char *path); +> +> or +> +> notmuch_status_t +> notmuch_database_compact (const char *path, const char *backup); +> +> which would internally open the database as read-write to ensure no +> modifications, compact, and close. If backup !=3D NULL, it would save the +> old database there (same mounted file system as the database is a fine +> limitation), otherwise remove. +> +This sounds fine to me. + +> Even then I'm not completely sure what Xapian WritableDatabase will do +> on close when the database has been switched underneath it. But moving +> the database after it has been closed has a race condition too. +> +Good points. Not sure what the least evil way about this is. Hopefully +Xapian's close operation really does just close file handles. + +Cheers, + +=2D Ben + + +[1] http://lists.xapian.org/pipermail/xapian-discuss/2011-July/008310.html + +--=-=-= +Content-Type: application/pgp-signature + +-----BEGIN PGP SIGNATURE----- +Version: GnuPG v1.4.12 (GNU/Linux) + +iQEcBAEBAgAGBQJSJfMVAAoJEErkyLZmeNiDO1MIAKvujl0gYgh6G+CBmZdKdw3p +Cb2o/kQ4Tl/5RU7rHkysAWmr32LeK4O9dSBXK9nR1QnU7bd54ekdrbFhm6ZIGMcU +lHSQQdZ3rHINNiyXD5jvcpTEt2Lb1Jz2Rz7y/rAPon6WpjbkRCncYcHQ0qjGlUNg +PnrLFTN1Xo/32AAUIgxgLE9KUP+h0QhOSTCvvth/CCCKAR3z6BNOBpOmSO6sKLJV +1uWjBJ4KiU2HPc/KYMJX+ek2L/FL+JDf7XgSJbph6kp0Rjd4XsuTdheSDogjnGL4 +R2QoVosKnUd+0rO+VeYI90PKqRdduQaTDniiPRCCi7zmXV0+wsZnwYIwWyCX0Qo= +=nhTs +-----END PGP SIGNATURE----- +--=-=-=-- -- 2.26.2