From: dm-list-email-notmuch Date: Fri, 11 Apr 2014 16:03:38 +0000 (+1700) Subject: Re: [PATCH] Add configurable changed tag to messages that have been changed on disk X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=d2ca49604e8b047b10be446f51a33d5034dac45c;p=notmuch-archives.git Re: [PATCH] Add configurable changed tag to messages that have been changed on disk --- diff --git a/27/5f4565af9b8f96627d545afd595205d3fd9914 b/27/5f4565af9b8f96627d545afd595205d3fd9914 new file mode 100644 index 000000000..e696ffa79 --- /dev/null +++ b/27/5f4565af9b8f96627d545afd595205d3fd9914 @@ -0,0 +1,155 @@ +Return-Path: + +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id D6677421176 + for ; Fri, 11 Apr 2014 09:05:00 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -2.3 +X-Spam-Level: +X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id oHihRs1iJvsa for ; + Fri, 11 Apr 2014 09:04:56 -0700 (PDT) +Received: from market.scs.stanford.edu (market.scs.stanford.edu [171.66.3.10]) + (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id 937E6421173 + for ; Fri, 11 Apr 2014 09:04:56 -0700 (PDT) +Received: from market.scs.stanford.edu (localhost.scs.stanford.edu + [127.0.0.1]) by market.scs.stanford.edu (8.14.7/8.14.7) with ESMTP id + s3BG3c97012339; Fri, 11 Apr 2014 09:03:38 -0700 (PDT) +Received: (from dm@localhost) + by market.scs.stanford.edu (8.14.7/8.14.7/Submit) id s3BG3cFO000873; + Fri, 11 Apr 2014 09:03:38 -0700 (PDT) +X-Authentication-Warning: market.scs.stanford.edu: dm set sender to + return-yjfumptdmm9v8zs6su3fi692c6@ta.scs.stanford.edu using -f +From: dm-list-email-notmuch@scs.stanford.edu +To: David Bremner , Gaute Hope +Subject: Re: [PATCH] Add configurable changed tag to messages that have been + changed on disk +In-Reply-To: <87k3aw5dj5.fsf@zancas.localnet> +References: <1396800683-9164-1-git-send-email-eg@gaute.vetsj.com> + <87wqf2gqig.fsf@ta.scs.stanford.edu> <1397140962-sup-6514@qwerzila> + <87wqexnqvb.fsf@ta.scs.stanford.edu> + <87k3aw5dj5.fsf@zancas.localnet> +Date: Fri, 11 Apr 2014 09:03:38 -0700 +Message-ID: <877g6v3lb9.fsf@ta.scs.stanford.edu> +MIME-Version: 1.0 +Content-Type: text/plain +Cc: notmuch +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +Reply-To: David Mazieres expires 2014-07-10 PDT + +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Fri, 11 Apr 2014 16:05:01 -0000 + +David Bremner writes: + +>> Exactly. It could be a tick, or just the current time of day if your +>> clock does not go backwards. (I'd be willing to do a full scan if the +>> clock ever goes backwards.) The advantage of time is that you don't +>> have to synchronously update some counter. +> +> I think I'd lean towards global time so that one could use it to resolve +> conflicts between changes to multiple copies of the database. + +I, too, would prefer to use time. However, I'm doubtful it would help +resolve conflicts. On the plus side, I'm not sure it is even needed to +resolve conflicts. My mail synchronizer has an algorithm for resolving +conflicts that always works without human intervention and in my limited +experience does exactly what I want: + + * If there's a conflict between two replicas, ensure that each + maildir ends up with the maximum number of the number copies of the + message in each of the two databases being reconciled. [Example: + If replica A deletes a message and replica B moves it from folder + INBOX to folder SPAM, you end up with a copy in spam. If replica A + moves a message to folder IMPORTANT and replica B moves it to SPAM, + then you get two hard links to the same file, one in IMPORTANT and + one in SPAM.] + + * If there's a conflict and two replicas have different tags on the + same message, then the tags in notmuch's new.tags directive get + logically ANDed, while all other tags get logically ORed. + +Granted, I've only been using this system for a week. On the other +hand, all I was doing was starting to test something I had written, yet +it ended up being so much better than my old system that I couldn't go +back and ended up using my system in production far earlier than +anticipated... + +>> Making sure the write-operations update the time should be easy. Most +>> or all of the changes are probably funneled through +>> _notmuch_message_sync. Worst case, there are only 9 places in the +>> source code that make use of a Xapian:WritableDatabase, so I'm pretty +>> confident total changes wouldn't be much more than 50 lines of code. +> +> Maybe. Don't forget upgrading the database, updating the test suite, and +> presumably some changes to the CLI so the new mtime can actually be +> used. Not to be discouraging ;). + +The CLI is trivial. We'll just add another search keyword ctime +analogous to date. + +As far as updating the test suite, etc., it's almost certain that the +core notmuch developers would be unsatisfied with whatever I've done, +since the code base is very clean and has a very uniform style. So when +I say I'd want some "indication that such a change could be upstreamed," +I mean more specifically that someone would be willing to shepherd the +process of getting the code into shape. + +> In the ensuing time, nothing better has developed for tag +> synchronization (my pet use case) so maybe it's time to pursue this +> again. + +I do have something pretty good for tag synchronization. It requires a +full database scan each time to detect changes, but I've heavily +optimized it to be very fast by skipping over the notmuch library and +directly scanning the underlying Xapian Btrees. Currently my bottleneck +is indexing messages (e.g., running notmuch new or calling +notmuch_database_add_message), which are painfully slow on 32-bit +machines. (Unfortunately my mail server is a 32-bit machine.) + +To give you an idea, on a 32 bit machine, if I get a handful of new mail +(e.g., 6 messages), running "notmuch new" takes 19 seconds, while +scanning the database to check for renames and changed tags adds another +1.4 seconds. On a 64-bit machine, "notmuch new" might take 1 second, +while scanning the database adds 350 msec. + +So full database scan's might not be the end of the world. The biggest +performance bottleneck at this point is notmuch's painful indexing +performance. It kills me that it takes 10 minutes to index 100,000 mail +messages on a 16-core machine with 48 GiB of RAM. But the library is +non-reentrant and allocates thread IDs in such a way that it's hard to +create parallel databases and later merge them. Basically I can't +figure out how to make productive use of more than one CPU core even +when synchronizing across 1GB Ethernet! + +It's pretty beta, but my intention is to open-source my code, so glad +for beta testers if you are interested in testing tag synchronization. + +> It would be good to have some preliminary idea about the time +> and space costs of adding document mtimes. I guess database bloat +> should not be too bad, since it's only 64bits (?) per mail message. + +Plus a Btree to index it, so figure at least 24 bytes per message. +Another issue is that values are always brought into memory with a +document, so it will consume more RAM. But yeah, I don't think it +should be that bad. + +David