Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 1B438431FCF for ; Tue, 3 Sep 2013 07:33:05 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.799 X-Spam-Level: X-Spam-Status: No, score=-0.799 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ePfhwJl0547z for ; Tue, 3 Sep 2013 07:32:57 -0700 (PDT) Received: from mail-yh0-f43.google.com (mail-yh0-f43.google.com [209.85.213.43]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 57F36431FC9 for ; Tue, 3 Sep 2013 07:32:57 -0700 (PDT) Received: by mail-yh0-f43.google.com with SMTP id b6so252912yha.16 for ; Tue, 03 Sep 2013 07:32:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:in-reply-to:references:user-agent:date:message-id :mime-version:content-type; bh=2E8XN5K7x3rTAx2m8ZbxK5k6dhSny6A5R2xEgdnF2Ec=; b=VXGXdzGe4jrD7PNiUktk9IUIGLmefAO1ytt9BzfpWE/Zc8uYFyBJyCfTk69w+UryVo MoBaoREJz+vqdZ3lShiADQWSUrOukIFi6jaueuMLozr9XT5LGlbAPHolaRMwvVqfzga4 uRXSZtM04ZuBKGRU9bSIliRdPwfe515cURLpPV2Cji5tMueKA/xU4FbxfQHm1La5qZyX s7yThVjXZQxKj1VCGjyS0XgXQnkH+m6puShcB2IGUnJDsHi8Wx1lpm8o7Yc5/juQQ41K oU5dnUYgQavgWoeOaKg/QZduMkh+MulaA3MXpsI5A6ZBOluw/gKWnjAoCSkN/+zTQ0Ms MJ9w== X-Received: by 10.236.9.9 with SMTP id 9mr340884yhs.97.1378218775599; Tue, 03 Sep 2013 07:32:55 -0700 (PDT) Received: from localhost (pool-108-8-228-201.spfdma.east.verizon.net. [108.8.228.201]) by mx.google.com with ESMTPSA id g25sm22829171yhg.6.1969.12.31.16.00.00 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 03 Sep 2013 07:32:54 -0700 (PDT) From: Ben Gamari To: Jani Nikula , notmuch@notmuchmail.org Subject: Re: [PATCH 1/3] database: Add notmuch_database_compact_close In-Reply-To: <87ppsssg3c.fsf@nikula.org> References: <1377312923-32274-1-git-send-email-bgamari.foss@gmail.com> <1377312923-32274-2-git-send-email-bgamari.foss@gmail.com> <87ppsssg3c.fsf@nikula.org> User-Agent: Notmuch/0.16+15~g17b9eaa (http://notmuchmail.org) Emacs/24.2.1 (x86_64-pc-linux-gnu) Date: Tue, 03 Sep 2013 10:32:20 -0400 Message-ID: <87k3iyyo0b.fsf@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2013 14:33:05 -0000 --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Jani Nikula writes: > On Sat, 24 Aug 2013, Ben Gamari wrote: >> This function uses Xapian's Compactor machinery to compact the notmuch >> database. The compacted database is built in a temporary directory and >> later moved into place while the original uncompacted database is >> preserved. >> snip >>=20=20 >> +class NotmuchCompactor : public Xapian::Compactor >> +{ >> +public: >> + virtual void >> + set_status (const std::string &table, const std::string &status) >> + { >> + if (status.length() =3D=3D 0) >> + printf ("compacting table %s:\n", table.c_str()); >> + else >> + printf (" %s\n", status.c_str()); >> + } > > We're trying to reduce the amount of prints directly from libnotmuch, > not increase. This applies here as well as below. > Fair enough. That being said, I think that status updates are fairly important given that the compaction process can be rather long. Would the preferred interface be to provide notmuch_database_compact_close with a progress callback? >> +}; >> + >> +#if HAVE_XAPIAN_COMPACT >> +notmuch_status_t >> +notmuch_database_compact_close (notmuch_database_t *notmuch) >> +{ >> + void *local =3D talloc_new (NULL); >> + NotmuchCompactor compactor; >> + char *notmuch_path, *xapian_path, *compact_xapian_path, *old_xapian= _path; >> + notmuch_status_t ret =3D NOTMUCH_STATUS_SUCCESS; >> + >> + if (! (notmuch_path =3D talloc_asprintf (local, "%s/%s", notmuch->p= ath, ".notmuch"))) { >> + ret =3D NOTMUCH_STATUS_OUT_OF_MEMORY; >> + goto DONE; >> + } >> + >> + if (! (xapian_path =3D talloc_asprintf (local, "%s/%s", notmuch_pat= h, "xapian"))) { >> + ret =3D NOTMUCH_STATUS_OUT_OF_MEMORY; >> + goto DONE; >> + } >> + >> + if (! (compact_xapian_path =3D talloc_asprintf (local, "%s.compact"= , xapian_path))) { >> + ret =3D NOTMUCH_STATUS_OUT_OF_MEMORY; >> + goto DONE; >> + } >> + >> + if (! (old_xapian_path =3D talloc_asprintf (local, "%s.old", xapian= _path))) { >> + ret =3D NOTMUCH_STATUS_OUT_OF_MEMORY; >> + goto DONE; >> + } >> + >> + try { >> + compactor.set_renumber(false); >> + compactor.add_source(xapian_path); >> + compactor.set_destdir(compact_xapian_path); >> + compactor.compact(); >> + >> + if (rename(xapian_path, old_xapian_path)) { >> + fprintf (stderr, "Error moving old database out of the way\n"); >> + ret =3D NOTMUCH_STATUS_FILE_ERROR; >> + goto DONE; >> + } > > This fails if old_xapian_path exists. > Ouch, yes, you are right. I suspect the right way forward here will be to check whether old_xapian_path exists before beginning compaction, allowing the user to fix the situation before it fails after finishing what might be a pretty long process. >> + >> + if (rename(compact_xapian_path, xapian_path)) { >> + fprintf (stderr, "Error moving compacted database\n"); >> + ret =3D NOTMUCH_STATUS_FILE_ERROR; >> + goto DONE; >> + } >> + } catch (Xapian::InvalidArgumentError e) { >> + fprintf (stderr, "Error while compacting: %s", e.get_msg().c_str()); >> + ret =3D NOTMUCH_STATUS_XAPIAN_EXCEPTION; >> + goto DONE; >> + } >> + >> + fprintf (stderr, "\n"); >> + fprintf (stderr, "\n"); >> + fprintf (stderr, "Old database has been moved to %s", old_xapian_pa= th); >> + fprintf (stderr, "\n"); >> + fprintf (stderr, "To delete run,\n"); >> + fprintf (stderr, "\n"); >> + fprintf (stderr, " rm -R %s\n", old_xapian_path); >> + fprintf (stderr, "\n"); >> + >> + notmuch_database_close(notmuch); >> + >> +DONE: >> + talloc_free(local); > > The database does not get closed on errors. If that's intentional, it > should be documented. > I had reasons for this but they have long fled my memory. Regardless of what it does, this behavior should be documented. I'll take care of this. >> + return ret; >> +} >> +#else >> +notmuch_status_t >> +notmuch_database_compact_close (unused (notmuch_database_t *notmuch)) >> +{ >> + fprintf (stderr, "notmuch was compiled against a xapian version lac= king compaction support.\n"); >> + return NOTMUCH_STATUS_UNSUPPORTED_OPERATION; >> +} >> +#endif >> + >> void >> notmuch_database_destroy (notmuch_database_t *notmuch) >> { >> diff --git a/lib/notmuch.h b/lib/notmuch.h >> index 998a4ae..e9abd90 100644 >> --- a/lib/notmuch.h >> +++ b/lib/notmuch.h >> @@ -101,6 +101,7 @@ typedef enum _notmuch_status { >> NOTMUCH_STATUS_TAG_TOO_LONG, >> NOTMUCH_STATUS_UNBALANCED_FREEZE_THAW, >> NOTMUCH_STATUS_UNBALANCED_ATOMIC, >> + NOTMUCH_STATUS_UNSUPPORTED_OPERATION, >>=20=20 >> NOTMUCH_STATUS_LAST_STATUS >> } notmuch_status_t; >> @@ -215,6 +216,20 @@ notmuch_database_open (const char *path, >> void >> notmuch_database_close (notmuch_database_t *database); >>=20=20 >> +/* Close the given notmuch database and then compact it. > > The implementation first compacts then closes. > >> + * After notmuch_database_close_compact has been called, calls to >> + * other functions on objects derived from this database may either >> + * behave as if the database had not been closed (e.g., if the >> + * required data has been cached) or may fail with a >> + * NOTMUCH_STATUS_XAPIAN_EXCEPTION. >> + * >> + * notmuch_database_close_compact can be called multiple times. Later >> + * calls have no effect. > > This is not true. The Xapian compactor does not require the database to > be open. It will happily open the database read-only and compact the > database again if database has been closed. > >> + */ >> +notmuch_status_t >> +notmuch_database_compact_close (notmuch_database_t *notmuch); > > I'm afraid we really need to re-think the API. > It seems you are right. When writing this interface it was clear that there would be a number of opportunities for misuse. I was hoping by combining compact and close some of these would be eliminated but clearly this isn't enough. > I see that your CLI 'notmuch compact' command opens the database > read-write, I assume to ensure there are no other writers, so that stuff > doesn't get lost. However if you pass a read-write database that has > been modified, I believe the changes will get lost (as Xapian opens the > database read-only). We shouldn't let the API users shoot themselves in > the foot so easily. > That is correct; the read-write database was an attempt to force the user to exclusively lock the database they are trying to compact. It seems that things can go quite wrong[1] when a database is modified during compaction. There was a suggestion in that thread to add an option to lock the database during compaction. Perhaps it might be worth bringing this up again with Xapian upstream. I think we agree that it would be a poor idea to merge compaction functionality without having a mechanism for ensuring data integrity, especially since many users invoke notmuch in a cron job. > I think I'd go for something like: > > notmuch_status_t > notmuch_database_compact (const char *path); > > or > > notmuch_status_t > notmuch_database_compact (const char *path, const char *backup); > > which would internally open the database as read-write to ensure no > modifications, compact, and close. If backup !=3D NULL, it would save the > old database there (same mounted file system as the database is a fine > limitation), otherwise remove. > This sounds fine to me. > Even then I'm not completely sure what Xapian WritableDatabase will do > on close when the database has been switched underneath it. But moving > the database after it has been closed has a race condition too. > Good points. Not sure what the least evil way about this is. Hopefully Xapian's close operation really does just close file handles. Cheers, =2D Ben [1] http://lists.xapian.org/pipermail/xapian-discuss/2011-July/008310.html --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAEBAgAGBQJSJfMVAAoJEErkyLZmeNiDO1MIAKvujl0gYgh6G+CBmZdKdw3p Cb2o/kQ4Tl/5RU7rHkysAWmr32LeK4O9dSBXK9nR1QnU7bd54ekdrbFhm6ZIGMcU lHSQQdZ3rHINNiyXD5jvcpTEt2Lb1Jz2Rz7y/rAPon6WpjbkRCncYcHQ0qjGlUNg PnrLFTN1Xo/32AAUIgxgLE9KUP+h0QhOSTCvvth/CCCKAR3z6BNOBpOmSO6sKLJV 1uWjBJ4KiU2HPc/KYMJX+ek2L/FL+JDf7XgSJbph6kp0Rjd4XsuTdheSDogjnGL4 R2QoVosKnUd+0rO+VeYI90PKqRdduQaTDniiPRCCi7zmXV0+wsZnwYIwWyCX0Qo= =nhTs -----END PGP SIGNATURE----- --=-=-=--