From: Austin Clements Date: Wed, 23 Apr 2014 20:59:20 +0000 (+2000) Subject: Re: [PATCH] Add configurable changed tag to messages that have been changed on disk X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=ea397a9c62b8429a2db1a6f38e250805cd2573be;p=notmuch-archives.git Re: [PATCH] Add configurable changed tag to messages that have been changed on disk --- diff --git a/b1/7f97089259c2d66d78a5cffaab6bc4050dfd80 b/b1/7f97089259c2d66d78a5cffaab6bc4050dfd80 new file mode 100644 index 000000000..328657315 --- /dev/null +++ b/b1/7f97089259c2d66d78a5cffaab6bc4050dfd80 @@ -0,0 +1,151 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 09C44431FBD + for ; Wed, 23 Apr 2014 13:59:37 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -0.7 +X-Spam-Level: +X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id SXgPjVp1H138 for ; + Wed, 23 Apr 2014 13:59:29 -0700 (PDT) +Received: from dmz-mailsec-scanner-5.mit.edu (dmz-mailsec-scanner-5.mit.edu + [18.7.68.34]) + by olra.theworths.org (Postfix) with ESMTP id 4AFA9431FAE + for ; Wed, 23 Apr 2014 13:59:29 -0700 (PDT) +X-AuditID: 12074422-f79186d00000135a-cf-535829b022e6 +Received: from mailhub-auth-2.mit.edu ( [18.7.62.36]) + (using TLS with cipher AES256-SHA (256/256 bits)) + (Client did not present a certificate) + by dmz-mailsec-scanner-5.mit.edu (Symantec Messaging Gateway) with SMTP + id 39.09.04954.0B928535; Wed, 23 Apr 2014 16:59:28 -0400 (EDT) +Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) + by mailhub-auth-2.mit.edu (8.13.8/8.9.2) with ESMTP id s3NKxQ0j020016; + Wed, 23 Apr 2014 16:59:27 -0400 +Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) + (authenticated bits=0) + (User authenticated as amdragon@ATHENA.MIT.EDU) + by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id s3NKxMNb008365 + (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); + Wed, 23 Apr 2014 16:59:24 -0400 +Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80) + (envelope-from ) + id 1Wd4GI-0004jG-1m; Wed, 23 Apr 2014 16:59:22 -0400 +Date: Wed, 23 Apr 2014 16:59:20 -0400 +From: Austin Clements +To: David Mazieres expires 2014-07-22 PDT + +Subject: Re: [PATCH] Add configurable changed tag to messages that have been + changed on disk +Message-ID: <20140423205920.GM25817@mit.edu> +References: <1396800683-9164-1-git-send-email-eg@gaute.vetsj.com> + <87wqf2gqig.fsf@ta.scs.stanford.edu> <1397140962-sup-6514@qwerzila> + <87wqexnqvb.fsf@ta.scs.stanford.edu> <1397163239-sup-5101@qwerzila> + <87d2g9ja0h.fsf@maritornes.cs.unb.ca> <1398237865-sup-624@qwerzila> + <87ioq0l8th.fsf@ta.scs.stanford.edu> +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +Content-Disposition: inline +In-Reply-To: <87ioq0l8th.fsf@ta.scs.stanford.edu> +User-Agent: Mutt/1.5.21 (2010-09-15) +X-Brightmail-Tracker: + H4sIAAAAAAAAA+NgFprOKsWRmVeSWpSXmKPExsUixG6nortBMyLY4OFaFYsbrd2MFk2fL7Fa + HJ/+hc3i+s2ZzA4sHj/+NbN5PFt1i9nj0t9tTB5bDr1nDmCJ4rJJSc3JLEst0rdL4Mo4OMWq + oEe2YsbtE6wNjNvFuxg5OSQETCSaFs1kg7DFJC7cWw9kc3EICcxmklh7dw8jhLORUWLhyVfs + EM5pJol36/exQDhLGCV+Hf7KAtLPIqAq8XPZT7BZbAIaEtv2L2cEsUUEiiSur/wPFmcGsk/v + 3A1WLywQJzHt8g4mEJtXQEfiQvdEZhBbSOAAk8TOP3wQcUGJkzOfsED0aknc+PcSqJ4DyJaW + WP6PAyTMKWAosafrPTuILSqgIjHl5Da2CYxCs5B0z0LSPQuhewEj8ypG2ZTcKt3cxMyc4tRk + 3eLkxLy81CJdU73czBK91JTSTYzg8HdR2sH486DSIUYBDkYlHt4DF8KDhVgTy4orcw8xSnIw + KYnyqilGBAvxJeWnVGYkFmfEF5XmpBYfYpTgYFYS4c3SAMrxpiRWVqUW5cOkpDlYlMR531pb + BQsJpCeWpGanphakFsFkZTg4lCR4z4E0ChalpqdWpGXmlCCkmTg4QYbzAA2fBja8uCAxtzgz + HSJ/ilFRSpx3mzpQQgAkkVGaB9cLS0+vGMWBXhHmLQdp5wGmNrjuV0CDmYAGF0wIBxlckoiQ + kmpgnF/YalHz60r/hKxd/IFSQdeuPDnm+K98SrmW9aPN0lGuSqyp1X4ZR/lOGD7dOeFfwcvV + B6/qhuct3uK9TenwhMjFBTPbr0v8qV/t1ZbWdFpY/zTbvbevH3FLvWf+LLlc7vvUf+vL7S/M + W3Xhcr7blGX3D9UrVyXov3+jaiW1/na0B+ORzoWs3UosxRmJhlrMRcWJAJ0nCSsqAwAA +Cc: notmuch +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Wed, 23 Apr 2014 20:59:37 -0000 + +Hi Dave! + +Quoth David Mazieres on Apr 23 at 2:00 am: +> Gaute Hope writes: +> +> > A db-tick or a _good_ ctime solution can as far as I can see solve both +> > David M's (correct me if I am wrong) and my purposes, as well as +> > probably have more use cases in the future. It would even be an +> > interesting direct search: show me everything that changed lately, +> > sorted. +> +> I could live with a db-tick scheme. I would prefer a ctime scheme, +> since then I can answer questions such as "what has changed in the last +> five minutes"? I mean all kinds of other stuff starts to break if your +> clock goes backwards on a mail server machine, not the least of which is +> that incremental backups will fail silently, so you risk losing your +> mail. +> +> A middle ground might be to use the maximum of two values: 1) the +> time-of-day at which notmuch started executing, and 2) the highest ctime +> in the database plus 100 microseconds (leaving plenty of slop to store +> timestamps as IEEE doubles with 52 significant bits). Since the values +> will be Btree-indexed, computing the max plus one will be cheap. + +This makes me curious if you've considered how to fit this in to +Xapian. The Xapian query syntax supports range queries over document +"values", but within the Xapian B-tree, values are stored in docid +order, not value order, so Xapian's range query operator is actually a +full scan in implementation. I assume it does this so it doesn't have +to store both forward and inverse indexes of values. (I spent some +time figuring out the layout of the Xapian database and have fairly +detailed notes if anyone's curious.) + +This is still reasonably fast in practice because it's a sequential +scan and only requires a few bytes per message, but it's probably not +what you'd expect. That said, Xapian does track per-value statistics +that would suffice for the particular problem of monotonic time stamps +(e.g., Database::get_value_upper_bound). + +In principle it would be possible to use user metadata or even +document terms to support true B-tree range scans by ctime order, but +I don't think it's possible to express queries over this using +Xapian's query parser. I've written about 90% of a (new) custom query +parser for Notmuch that would enable this, but little things like my +looming thesis deadline have interfered with me finishing it. + +> Incidentally, if you are really this paranoid about time stamps, it +> should bother you that notmuch's directory timestamps only have one +> second granularity. It's not that hard to get a new message delivered +> in the same second that notmuch new finished running. In my +> synchronizer, I convert st_mtim (a struct timespec) into a double and +> keep that plus size in the database to decide if I need to re-hash +> files. But for directories, I'm stuck with NOTMUCH_VALUE_TIMESTAMP, +> which are quantized to the second. (Ironically, I think +> Xapian::sortable_serialize converts time_ts to doubles anyway, so +> avoiding st_mtim is not really helping performance.) + +This is historical (and, I agree, unfortunate). But nobody's +complained, so it hasn't been worth changing the libnotmuch interface +to support sub-second directory mtimes. However, notmuch new does +correctly handle deliveries in the same second it runs. If the +wall-clock time when it starts is the same as the on-disk directory +mtime, it skips updating the in-database directory mtime at the end. +Hence, on the next run, it will still consider the directory +out-of-date. It's a bit of a hack, but it's a hack that would be +necessary for supporting older file systems even if we did support +sub-second timestamps.