From 4cfd8967526d84fba2246e820ac6ae1066db9331 Mon Sep 17 00:00:00 2001 From: David Mazieres Date: Sun, 6 Apr 2014 22:19:19 +0200 Subject: [PATCH] Re: [PATCH] Add configurable changed tag to messages that have been changed on disk --- 08/93a19356d6fae63e51dbb9b82673a2a49e9c6f | 121 ++++++++++++++++++++++ 1 file changed, 121 insertions(+) create mode 100644 08/93a19356d6fae63e51dbb9b82673a2a49e9c6f diff --git a/08/93a19356d6fae63e51dbb9b82673a2a49e9c6f b/08/93a19356d6fae63e51dbb9b82673a2a49e9c6f new file mode 100644 index 000000000..274b49e93 --- /dev/null +++ b/08/93a19356d6fae63e51dbb9b82673a2a49e9c6f @@ -0,0 +1,121 @@ +Return-Path: + +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 304B5431FBC + for ; Sun, 6 Apr 2014 13:19:30 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -2.3 +X-Spam-Level: +X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id s62A4tGVgLJO for ; + Sun, 6 Apr 2014 13:19:24 -0700 (PDT) +Received: from market.scs.stanford.edu (market.scs.stanford.edu [171.66.3.10]) + (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id 26004431FB6 + for ; Sun, 6 Apr 2014 13:19:24 -0700 (PDT) +Received: from market.scs.stanford.edu (localhost.scs.stanford.edu + [127.0.0.1]) by market.scs.stanford.edu (8.14.7/8.14.7) with ESMTP id + s36KJLvB015571; Sun, 6 Apr 2014 13:19:21 -0700 (PDT) +Received: (from dm@localhost) + by market.scs.stanford.edu (8.14.7/8.14.7/Submit) id s36KJKZv025070; + Sun, 6 Apr 2014 13:19:20 -0700 (PDT) +X-Authentication-Warning: market.scs.stanford.edu: dm set sender to + return-qins2mg6jwx9mkqf94wrbsevww@ta.scs.stanford.edu using -f +From: David Mazieres +To: Gaute Hope , notmuch@notmuchmail.org +Subject: Re: [PATCH] Add configurable changed tag to messages that have been + changed on disk +In-Reply-To: <1396800683-9164-1-git-send-email-eg@gaute.vetsj.com> +References: <1396800683-9164-1-git-send-email-eg@gaute.vetsj.com> +Date: Sun, 06 Apr 2014 22:19:19 +0200 +Message-ID: <87wqf2gqig.fsf@ta.scs.stanford.edu> +MIME-Version: 1.0 +Content-Type: text/plain +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +Reply-To: David Mazieres expires 2014-07-05 CEST + +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sun, 06 Apr 2014 20:19:30 -0000 + +Gaute Hope writes: + +> When one of the source files for a message is changed on disk, renamed, +> deleted or a new source file is added. A configurable changed tag is +> is added. The tag can be configured under the option 'changed_tags' in +> the [new] section, the default is none. Tests have been updated to +> accept the new config option. +> +> notmuch-setup now asks for a changed tag after the new tags question. +> +> This could be useful for for example 'afew' to detect remote changes in +> IMAP folders and update the FolderNameFilter to also add tags or remove +> tags when a _existing_ message has been added to or removed from a +> maildir. + +I think this is the wrong way to achieve such functionality, because +then the change tag A) is expensive to remove, B) is easy to misuse +(remember to call fsync everywhere before deleting the change tag), and +C) can be used by only one application. + +A better approach would be to add a new "modtime" xapian value that is +updated whenever the tags or any other terms (such as XFDIRENTRY) are +added to or deleted from a docid. If it's a Xapian value, rather than a +term, then modtime will be queriable just like date, allowing multiple +applications to query all docids modified since the last time they ran. + +I currently have multiple applications that could significantly benefit +from such a modtime. An obvious one is proper incremental backups with +notmuch-dump. + +Another example is a tool I have that synchromizes maildirs and notmuch +tags across machines. With the current interface, there is no way to do +this without scanning the entire database, because any message, even a +very old one, may have changed tags or links. Moreover, something like +notmuch-dump is way, way too slow to run every time you want to check +for new mail. notmuch-dump costs 5-10 seconds on my 110,000-message +maildir! In fact, any approach the gathers tags associated with each +individual docid is a complete non-starter, forcing me to violate +abstraction and examine the postlists associated with each tag and +XFDIRENTRY term. Even my highly optimized implementation takes about +250 msec (1400 msec on a 32-bit machine), which adds perceptible latency +to synchronizing my clients' notmuch maildirs with my server's when I +poll for new mail. + +Yet another application is something like nottoomuch-addresses, which +currently uses an occasionally incorrect heuristic to detect new +messages based on the Date header. + +Let me make a stronger statement, which is that not only are +modification times an incredibly useful and general primitive, but lack +of modification times is the single thing that kept me away from notmuch +despite years of wanting to switch. In the end, I invested months +developing a highly-optimized change detector that efficiently diffs +Xapian's Btrees against a mysql database with a snapshot of the same +information. My solution works, and I now enjoy a replicated notmuch +setup synchronized across three machines, including offline access on my +laptop. But my 4,000-line C++ program might have been a 400-line shell +script if only notmuch supported docid mod times. + +Also, to put this in perspective, how long does it take to remove the +changed tags from a bunch of messages? If it's longer than 300 msec on +a 64-bit machine, then even with a single application you'd be better +off using my crazy on-the-side mysql version vector scheme. + +David -- 2.26.2