Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 1EFD0431FBC for ; Sun, 27 Jul 2014 09:58:46 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -2.3 X-Spam-Level: X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GHQtWonDpKLt for ; Sun, 27 Jul 2014 09:58:38 -0700 (PDT) Received: from dmz-mailsec-scanner-7.mit.edu (dmz-mailsec-scanner-7.mit.edu [18.7.68.36]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id E19C1431FAE for ; Sun, 27 Jul 2014 09:58:37 -0700 (PDT) X-AuditID: 12074424-f79146d00000067c-f1-53d52fbd0990 Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) (using TLS with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP id 06.24.01660.DBF25D35; Sun, 27 Jul 2014 12:58:37 -0400 (EDT) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id s6RGwawD013761; Sun, 27 Jul 2014 12:58:36 -0400 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id s6RGwYJw022906 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Sun, 27 Jul 2014 12:58:36 -0400 Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80) (envelope-from ) id 1XBRmM-0007Fa-3C; Sun, 27 Jul 2014 12:58:34 -0400 Date: Sun, 27 Jul 2014 12:58:33 -0400 From: Austin Clements To: Mark Walters Subject: Re: [PATCH 05/14] lib: Database version 3: Introduce fine-grained "features" Message-ID: <20140727165833.GH13893@mit.edu> References: <1406433173-19169-1-git-send-email-amdragon@mit.edu> <1406433173-19169-6-git-send-email-amdragon@mit.edu> <87silnyvoq.fsf@qmul.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87silnyvoq.fsf@qmul.ac.uk> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmplleLIzCtJLcpLzFFi42IR4hTV1t2rfzXYoOket8XquTwW12/OZHZg 8tg56y67x7NVt5gDmKK4bFJSczLLUov07RK4MpbN/cVacHkqY8W+PytYGxgbCrsYOTkkBEwk Ns5YxQhhi0lcuLeerYuRi0NIYDaTxN/GRUwQzkZGieaOucwQzmkmiYsL1jBCOEsYJfa9WAPW zyKgKtHbdIEdxGYT0JDYtn85WFxEQEfi9qEFYHFmAWmJb7+bmUBsYYFwiRdTLoPV8ALVtLav hho6lVFibW8LC0RCUOLkzCcsEM1aEjf+vQRq5gAbtPwfB0iYE2jXoiXLmEFsUQEViSknt7FN YBSahaR7FpLuWQjdCxiZVzHKpuRW6eYmZuYUpybrFicn5uWlFuma6+VmluilppRuYgSFNruL yg7G5kNKhxgFOBiVeHgz2K4EC7EmlhVX5h5ilORgUhLl1da+GizEl5SfUpmRWJwRX1Sak1p8 iFGCg1lJhLfwLVA5b0piZVVqUT5MSpqDRUmc9621VbCQQHpiSWp2ampBahFMVoaDQ0mCV0EP aKhgUWp6akVaZk4JQpqJgxNkOA/Q8ABdoBre4oLE3OLMdIj8KUZFKXHedpCEAEgiozQPrheW el4xigO9IswrCrKCB5i24LpfAQ1mAhrM4n8ZZHBJIkJKqoHRN27/qzel3RtqSrmmmgt8vyM7 jZvp3eSt/rJdhq755w1y86R6rz8InFv0KjPyGjf7eZNldjubTxpvfNR9xNJA/oRWT0TAyp4l O4KX8okdXOEUeTUpx51vxqodT833i1t/2Ret9iyU43l8e82R/v/+NomiKcbJbNnX2N59VjE0 SlyXtK1870YlluKMREMt5qLiRABpAKaHGAMAAA== Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Jul 2014 16:58:46 -0000 Quoth Mark Walters on Jul 27 at 10:53 am: > > On Sun, 27 Jul 2014, Austin Clements wrote: > > Previously, our database schema was versioned by a single number. > > Each database schema change had to occur "atomically" in Notmuch's > > development history: before some commit, Notmuch used version N, after > > that commit, it used version N+1. Hence, each new schema version > > could introduce only one change, the task of developing a schema > > change fell on a single person, and it all had to happen and be > > perfect in a single commit series. This made introducing a new schema > > version hard. We've seen only two schema changes in the history of > > Notmuch. > > I like this series as a whole and I am happy with all the patches that I > haven't sent comments on. (Note, however, that I am not very familiar > with the database code.) > > There is one thing which confuses me in this patch: > > > > > This commit introduces database schema version 3; hopefully the last > > schema version we'll need for a while. With this version, we switch > > from a single version number to "features": a set of named, > > independent aspects of the database schema. > > > > Features should make backwards compatibility easier. For many things, > > it should be easy to support databases both with and without a > > feature, which will allow us to make upgrades optional and will enable > > "unstable" features that can be developed and tested over time. > > > > Features also make forwards compatibility easier. The features > > recorded in a database include "compatibility flags," which can > > indicate to an older version of Notmuch when it must support a given > > feature to open the database for read or for write. This lets us > > replace the old vague "I don't recognize this version, so something > > might go wrong, but I promise to try my best" warnings upon opening a > > database with an unknown version with precise errors. If a database > > is safe to open for read/write despite unknown features, an older > > version will know that and issue no message at all. If the database > > is not safe to open for read/write because of unknown features, an > > older version will know that, too, and can tell the user exactly which > > required features it lacks support for. > > --- > > lib/database-private.h | 56 +++++++++++++++ > > lib/database.cc | 189 +++++++++++++++++++++++++++++++++++++++++-------- > > 2 files changed, 214 insertions(+), 31 deletions(-) > > > > diff --git a/lib/database-private.h b/lib/database-private.h > > index d3e65fd..323b9fe 100644 > > --- a/lib/database-private.h > > +++ b/lib/database-private.h > > @@ -46,6 +46,11 @@ struct _notmuch_database { > > int atomic_nesting; > > Xapian::Database *xapian_db; > > > > + /* Bit mask of features used by this database. Features are > > + * named, independent aspects of the database schema. This is a > > + * bitwise-OR of NOTMUCH_FEATURE_* values (below). */ > > + unsigned int features; > > + > > unsigned int last_doc_id; > > uint64_t last_thread_id; > > > > @@ -55,6 +60,57 @@ struct _notmuch_database { > > Xapian::ValueRangeProcessor *date_range_processor; > > }; > > > > +/* Bit masks for _notmuch_database::features. */ > > +enum { > > + /* If set, file names are stored in "file-direntry" terms. If > > + * unset, file names are stored in document data. > > + * > > + * Introduced: version 1. Implementation support: both for read; > > + * required for write. */ > > + NOTMUCH_FEATURE_FILE_TERMS = 1 << 0, > > + > > + /* If set, directory timestamps are stored in documents with > > + * XDIRECTORY terms and relative paths. If unset, directory > > + * timestamps are stored in documents with XTIMESTAMP terms and > > + * absolute paths. > > + * > > + * Introduced: version 1. Implementation support: required. */ > > + NOTMUCH_FEATURE_DIRECTORY_DOCS = 1 << 1, > > + > > + /* If set, the from, subject, and message-id headers are stored in > > + * message document values. If unset, message documents *may* > > + * have these values, but if the value is empty, it must be > > + * retrieved from the message file. > > + * > > + * Introduced: optional in version 1, required as of version 3. > > + * Implementation support: both. > > + */ > > + NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES = 1 << 2, > > + > > + /* If set, folder terms are boolean and path terms exist. If > > + * unset, folder terms are probabilistic and stemmed and path > > + * terms do not exist. > > + * > > + * Introduced: version 2. Implementation support: required. */ > > + NOTMUCH_FEATURE_BOOL_FOLDER = 1 << 3, > > +}; > > + > > +/* Prior to database version 3, features were implied by the database > > + * version number, so hard-code them for earlier versions. */ > > +#define NOTMUCH_FEATURES_V0 (0) > > +#define NOTMUCH_FEATURES_V1 (NOTMUCH_FEATURES_V0 | NOTMUCH_FEATURE_FILE_TERMS | \ > > + NOTMUCH_FEATURE_DIRECTORY_DOCS) > > +#define NOTMUCH_FEATURES_V2 (NOTMUCH_FEATURES_V1 | NOTMUCH_FEATURE_BOOL_FOLDER) > > + > > +/* Current database features. If any of these are missing from a > > + * database, request an upgrade. > > + * NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES is not included because > > + * upgrade doesn't currently introduce the feature (though brand new > > + * databases will have it). */ > > +#define NOTMUCH_FEATURES_CURRENT \ > > + (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_DIRECTORY_DOCS | \ > > + NOTMUCH_FEATURE_BOOL_FOLDER) > > + > > /* Return the list of terms from the given iterator matching a prefix. > > * The prefix will be stripped from the strings in the returned list. > > * The list will be allocated using ctx as the talloc context. > > diff --git a/lib/database.cc b/lib/database.cc > > index 45c4260..03eef3e 100644 > > --- a/lib/database.cc > > +++ b/lib/database.cc > > @@ -20,6 +20,7 @@ > > > > #include "database-private.h" > > #include "parse-time-vrp.h" > > +#include "string-util.h" > > > > #include > > > > @@ -42,7 +43,7 @@ typedef struct { > > const char *prefix; > > } prefix_t; > > > > -#define NOTMUCH_DATABASE_VERSION 2 > > +#define NOTMUCH_DATABASE_VERSION 3 > > > > #define STRINGIFY(s) _SUB_STRINGIFY(s) > > #define _SUB_STRINGIFY(s) #s > > @@ -151,6 +152,17 @@ typedef struct { > > * changes are made to the database (such as by > > * indexing new fields). > > * > > + * features The set of features supported by this > > + * database. This consists of a set of > > + * '\n'-separated lines, where each is a feature > > + * name, a '\t', and compatibility flags. If the > > + * compatibility flags contain 'w', then the > > + * opener must support this feature to safely > > + * write this database. If the compatibility > > + * flags contain 'r', then the opener must > > + * support this feature to read this database. > > + * Introduced in database version 3. > > + * > > * last_thread_id The last thread ID generated. This is stored > > * as a 16-byte hexadecimal ASCII representation > > * of a 64-bit unsigned integer. The first ID > > @@ -226,6 +238,7 @@ static prefix_t PROBABILISTIC_PREFIX[]= { > > { "subject", "XSUBJECT"}, > > }; > > > > + > > const char * > > _find_prefix (const char *name) > > { > > @@ -251,6 +264,25 @@ _find_prefix (const char *name) > > return ""; > > } > > > > +static const struct > > +{ > > + /* NOTMUCH_FEATURE_* value. */ > > + unsigned int value; > > + /* Feature name as it appears in the database. This name should > > + * be appropriate for displaying to the user if an older version > > + * of notmuch doesn't support this feature. */ > > + const char *name; > > + /* Compatibility flags when this feature is declared. */ > > + const char *flags; > > +} feature_names[] = { > > + {NOTMUCH_FEATURE_FILE_TERMS, "file terms", "rw"}, > > + {NOTMUCH_FEATURE_DIRECTORY_DOCS, "directory documents", "rw"}, > > + /* Header values are not required for reading a database because a > > + * reader can just refer to the message file. */ > > + {NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES, "from/subject/ID values", "w"}, > > + {NOTMUCH_FEATURE_BOOL_FOLDER, "boolean folder terms", "rw"}, > > +}; > > + > > const char * > > notmuch_status_to_string (notmuch_status_t status) > > { > > @@ -591,6 +623,11 @@ notmuch_database_create (const char *path, notmuch_database_t **database) > > ¬much); > > if (status) > > goto DONE; > > + > > + /* Upgrade doesn't add this feature to existing databases, but new > > + * databases have it. */ > > + notmuch->features |= NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES; > > + > > Does this mean that if the user wants this feature he has to dump, > delete the existing database, and restore? Is that just because no-one > has implemented upgrade for this feature or is it "hard" in some sense? Yep. But note that while this feature bit *guarantees* that these values will be available, if this feature bit isn't set, the code will still check for these values and use them if they're present. Implementing an upgrade for this feature is "hard" in the sense that it requires parsing messages to fill in missing values. Currently, all upgrades operate on the database exclusively, without going back to the original messages. I doubt we'll be able to keep that up forever, but that's what's going on now. (It'll be tricky to deal with when we do need it, since the database may not be up-to-date with the mail store, so we may not be able to open all messages!) > A possibly related question: is there likely to be a case where the user > does not want to add/upgrade some feature? Would it be worth having an > explicit upgrade command which let the user choose which features to > upgrade? This is orthogonal to this series: I am just trying to get a > feel for how it will, or could, be used. Good question. My gut feeling is that we should try to avoid this in the general case because this sort of "feature fragmentation" would complicate the code, as well as testing and debugging. There are a few potential uses I can think of, though. One relates to what I was saying above: if some upgrade requires reindexing messages, we may want to let the user postpone that (maybe notmuch new --reindex or something). I also think we should have "unstable" features for things that are under development and may change in non-backwards-compatible ways. This would be specifically targeted at developers, and developers would have to opt in to these unstable features. But the goal would always be to stabilize them and make them standard features that are automatically upgraded to. > Best wishes > > Mark > > > status = notmuch_database_upgrade (notmuch, NULL, NULL); > > if (status) { > > notmuch_database_close(notmuch); > > @@ -619,6 +656,80 @@ _notmuch_database_ensure_writable (notmuch_database_t *notmuch) > > return NOTMUCH_STATUS_SUCCESS; > > } > > > > +/* Parse a database features string from the given database version. > > + * > > + * For version < 3, this ignores the features string and returns a > > + * hard-coded set of features. > > + * > > + * If there are unrecognized features that are required to open the > > + * database in mode (which should be 'r' or 'w'), return a > > + * comma-separated list of unrecognized but required features in > > + * *incompat_out, which will be allocated from ctx. > > + */ > > +static unsigned int > > +_parse_features (const void *ctx, const char *features, unsigned int version, > > + char mode, char **incompat_out) > > +{ > > + unsigned int res = 0, namelen, i; > > + size_t llen = 0; > > + const char *flags; > > + > > + /* Prior to database version 3, features were implied by the > > + * version number. */ > > + if (version == 0) > > + return NOTMUCH_FEATURES_V0; > > + else if (version == 1) > > + return NOTMUCH_FEATURES_V1; > > + else if (version == 2) > > + return NOTMUCH_FEATURES_V2; > > + > > + /* Parse the features string */ > > + while ((features = strtok_len_c (features + llen, "\n", &llen)) != NULL) { > > + flags = strchr (features, '\t'); > > + if (! flags || flags > features + llen) > > + continue; > > + namelen = flags - features; > > + > > + for (i = 0; i < ARRAY_SIZE (feature_names); ++i) { > > + if (strlen (feature_names[i].name) == namelen && > > + strncmp (feature_names[i].name, features, namelen) == 0) { > > + res |= feature_names[i].value; > > + break; > > + } > > + } > > + > > + if (i == ARRAY_SIZE (feature_names)) { > > + /* Unrecognized feature */ > > + const char *have = strchr (flags, mode); > > + if (have && have < features + llen) { > > + /* This feature is required to access this database in > > + * 'mode', but we don't understand it. */ > > + if (! *incompat_out) > > + *incompat_out = talloc_strdup (ctx, ""); > > + *incompat_out = talloc_asprintf_append_buffer ( > > + *incompat_out, "%s%.*s", **incompat_out ? ", " : "", > > + namelen, features); > > + } > > + } > > + } > > + > > + return res; > > +} > > + > > +static char * > > +_print_features (const void *ctx, unsigned int features) > > +{ > > + unsigned int i; > > + char *res = talloc_strdup (ctx, ""); > > + > > + for (i = 0; i < ARRAY_SIZE (feature_names); ++i) > > + if (features & feature_names[i].value) > > + res = talloc_asprintf_append_buffer ( > > + res, "%s\t%s\n", feature_names[i].name, feature_names[i].flags); > > + > > + return res; > > +} > > + > > notmuch_status_t > > notmuch_database_open (const char *path, > > notmuch_database_mode_t mode, > > @@ -627,7 +738,7 @@ notmuch_database_open (const char *path, > > notmuch_status_t status = NOTMUCH_STATUS_SUCCESS; > > void *local = talloc_new (NULL); > > notmuch_database_t *notmuch = NULL; > > - char *notmuch_path, *xapian_path; > > + char *notmuch_path, *xapian_path, *incompat_features; > > struct stat st; > > int err; > > unsigned int i, version; > > @@ -686,39 +797,51 @@ notmuch_database_open (const char *path, > > if (mode == NOTMUCH_DATABASE_MODE_READ_WRITE) { > > notmuch->xapian_db = new Xapian::WritableDatabase (xapian_path, > > Xapian::DB_CREATE_OR_OPEN); > > - version = notmuch_database_get_version (notmuch); > > - > > - if (version > NOTMUCH_DATABASE_VERSION) { > > - fprintf (stderr, > > - "Error: Notmuch database at %s\n" > > - " has a newer database format version (%u) than supported by this\n" > > - " version of notmuch (%u). Refusing to open this database in\n" > > - " read-write mode.\n", > > - notmuch_path, version, NOTMUCH_DATABASE_VERSION); > > - notmuch->mode = NOTMUCH_DATABASE_MODE_READ_ONLY; > > - notmuch_database_destroy (notmuch); > > - notmuch = NULL; > > - status = NOTMUCH_STATUS_FILE_ERROR; > > - goto DONE; > > - } > > - > > - if (version < NOTMUCH_DATABASE_VERSION) > > - notmuch->needs_upgrade = TRUE; > > } else { > > notmuch->xapian_db = new Xapian::Database (xapian_path); > > - version = notmuch_database_get_version (notmuch); > > - if (version > NOTMUCH_DATABASE_VERSION) > > - { > > - fprintf (stderr, > > - "Warning: Notmuch database at %s\n" > > - " has a newer database format version (%u) than supported by this\n" > > - " version of notmuch (%u). Some operations may behave incorrectly,\n" > > - " (but the database will not be harmed since it is being opened\n" > > - " in read-only mode).\n", > > - notmuch_path, version, NOTMUCH_DATABASE_VERSION); > > - } > > } > > > > + /* Check version. As of database version 3, we represent > > + * changes in terms of features, so assume a version bump > > + * means a dramatically incompatible change. */ > > + version = notmuch_database_get_version (notmuch); > > + if (version > NOTMUCH_DATABASE_VERSION) { > > + fprintf (stderr, > > + "Error: Notmuch database at %s\n" > > + " has a newer database format version (%u) than supported by this\n" > > + " version of notmuch (%u).\n", > > + notmuch_path, version, NOTMUCH_DATABASE_VERSION); > > + notmuch->mode = NOTMUCH_DATABASE_MODE_READ_ONLY; > > + notmuch_database_destroy (notmuch); > > + notmuch = NULL; > > + status = NOTMUCH_STATUS_FILE_ERROR; > > + goto DONE; > > + } > > + > > + /* Check features. */ > > + incompat_features = NULL; > > + notmuch->features = _parse_features ( > > + local, notmuch->xapian_db->get_metadata ("features").c_str (), > > + version, mode == NOTMUCH_DATABASE_MODE_READ_WRITE ? 'w' : 'r', > > + &incompat_features); > > + if (incompat_features) { > > + fprintf (stderr, > > + "Error: Notmuch database at %s\n" > > + " requires features (%s)\n" > > + " not supported by this version of notmuch.\n", > > + notmuch_path, incompat_features); > > + notmuch->mode = NOTMUCH_DATABASE_MODE_READ_ONLY; > > + notmuch_database_destroy (notmuch); > > + notmuch = NULL; > > + status = NOTMUCH_STATUS_FILE_ERROR; > > + goto DONE; > > + } > > + > > + /* Do we want an upgrade? */ > > + if (mode == NOTMUCH_DATABASE_MODE_READ_WRITE && > > + NOTMUCH_FEATURES_CURRENT & ~notmuch->features) > > + notmuch->needs_upgrade = TRUE; > > + > > notmuch->last_doc_id = notmuch->xapian_db->get_lastdocid (); > > last_thread_id = notmuch->xapian_db->get_metadata ("last_thread_id"); > > if (last_thread_id.empty ()) { > > @@ -1077,6 +1200,7 @@ notmuch_database_upgrade (notmuch_database_t *notmuch, > > double progress), > > void *closure) > > { > > + void *local = talloc_new (NULL); > > Xapian::WritableDatabase *db; > > struct sigaction action; > > struct itimerval timerval; > > @@ -1226,6 +1350,8 @@ notmuch_database_upgrade (notmuch_database_t *notmuch, > > notmuch_query_destroy (query); > > } > > > > + notmuch->features |= NOTMUCH_FEATURES_CURRENT; > > + db->set_metadata ("features", _print_features (local, notmuch->features)); > > db->set_metadata ("version", STRINGIFY (NOTMUCH_DATABASE_VERSION)); > > db->flush (); > > > > @@ -1302,6 +1428,7 @@ notmuch_database_upgrade (notmuch_database_t *notmuch, > > sigaction (SIGALRM, &action, NULL); > > } > > > > + talloc_free (local); > > return NOTMUCH_STATUS_SUCCESS; > > } > >