From a0d57a111b718770173f80ca87e1622ea6fb2a15 Mon Sep 17 00:00:00 2001 From: dm-list-email-notmuch Date: Mon, 14 Apr 2014 12:52:47 +1700 Subject: [PATCH] Re: Synchronization success stories? --- a6/f3563ee67d3c848c25986f646fc7ed1135faac | 150 ++++++++++++++++++++++ 1 file changed, 150 insertions(+) create mode 100644 a6/f3563ee67d3c848c25986f646fc7ed1135faac diff --git a/a6/f3563ee67d3c848c25986f646fc7ed1135faac b/a6/f3563ee67d3c848c25986f646fc7ed1135faac new file mode 100644 index 000000000..0ed3b6e56 --- /dev/null +++ b/a6/f3563ee67d3c848c25986f646fc7ed1135faac @@ -0,0 +1,150 @@ +Return-Path: + +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 15EA4431FBD + for ; Sun, 13 Apr 2014 12:52:59 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -2.3 +X-Spam-Level: +X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id 3hkX2x0OY0lL for ; + Sun, 13 Apr 2014 12:52:50 -0700 (PDT) +Received: from market.scs.stanford.edu (market.scs.stanford.edu [171.66.3.10]) + (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id 27553431FBC + for ; Sun, 13 Apr 2014 12:52:50 -0700 (PDT) +Received: from market.scs.stanford.edu (localhost.scs.stanford.edu + [127.0.0.1]) by market.scs.stanford.edu (8.14.7/8.14.7) with ESMTP id + s3DJqmB5016262; Sun, 13 Apr 2014 12:52:48 -0700 (PDT) +Received: (from dm@localhost) + by market.scs.stanford.edu (8.14.7/8.14.7/Submit) id s3DJqlUW024376; + Sun, 13 Apr 2014 12:52:47 -0700 (PDT) +X-Authentication-Warning: market.scs.stanford.edu: dm set sender to + return-axb7s9swz7zfq6nbxfhh8ym7p2@ta.scs.stanford.edu using -f +From: dm-list-email-notmuch@scs.stanford.edu +To: Tilmann Singer , Brian Sniffen , + notmuch@notmuchmail.org +Subject: Re: Synchronization success stories? +In-Reply-To: <8738hhvygu.fsf@tils.net> +References: + <87ppklwin6.fsf@tils.net> <87siphl0cl.fsf@ta.scs.stanford.edu> + <8738hhvygu.fsf@tils.net> +Date: Sun, 13 Apr 2014 12:52:47 -0700 +Message-ID: <87ppklknw0.fsf@ta.scs.stanford.edu> +MIME-Version: 1.0 +Content-Type: text/plain +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +Reply-To: David Mazieres expires 2014-07-12 PDT + +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sun, 13 Apr 2014 19:52:59 -0000 + +Tilmann Singer writes: + +> David Mazieres writes: +>> What happens if you get a message that's been stuck in a queue for a few +>> days and has an old Date: header? +> +> It would be missed. I have set the timespan to look backwards for new +> mail to one month to be a bit safer against the stuck-in-queue cases, +> but mails with older Date: headers would definitely get missed. +> +> The current output of notmuch count "*" is the same on both the client +> and the server, so it seems I didn't run into this problem yet (maybe I +> was just lucky). + +I've been playing around with reorganizing my maildir, and found a +couple of messages (on mailing lists) with clearly invalid dates years +in the past. But checking with notmuch count is a good idea. Then you +can always fall back to the slow path in the unlikely event that your +counts don't match up. Well, except that A) count is just unique +message-IDs, not messages, and B) when synchronizing in both directions +you could still miss something. You have to assume that the invalid +dates are only ever going to occur at one end of a synchronization +event. + +>> Or if you get new messages that have +>> the same Message-ID as old ones? +> +> Is that even possible? I thought that notmuch guarantees the uniqueness +> of indexed message ids. The only reference I could find without trying +> to read the code was this thread id:87mwyz3s9d.fsf@star.eba from 2012, +> which supports the assumption. + +Sadly, yes it is quite possible, and even opens up a slight security +issue. Suppose I know you are on a mailing list, and some message +appears on that mailing list that I don't want you to see. I can send +you an innocuous-looking message that just happens to have the same +message-id, and you may never see the original mailing list message. +Even better, depending on how your spam filtering is setup, if I include +the GTUBE string in my message you may never see mine or the original. + +That's why with muchsync, I replicate actual mail messages, rather than +message-IDs. Then you can always periodically check for message-IDs +that appear in more than one file. (In fact, thought I haven't +published an interface for this, the SQL database kept my muchsync makes +it trivial to check for this and detect certain attacks.) + +I understand why notmuch went with message IDs. For instance you have +sent this reply both directly to me and to a mailing list I am +subscribed to. So I will get two slightly different copies of the +message (one will have the standard notmuch mailing list signature, the +other won't). And this way once I've marked it read, the message will +be read even once the second copy comes in. But personally I'd rather +see the occasional duplicate message than risk not seeing messages. In +particular, if the goal is to see fewer unread messages, some sort of +feature that pro-actively skips all future messages in a thread or +subthread would be more useful... + +> Here is how long they take (on a machine with an SSD, which certainly +> helps): +> +> $ time notmuch dump --format=batch-tag | sort > /tmp/notmuch.dump +> real 0m3.643s +> user 0m3.593s +> sys 0m0.140s +> $ time notmuch restore < /tmp/notmuch.dump +> real 0m3.719s +> user 0m3.357s +> sys 0m0.357s +> $ notmuch count +> 117118 + +That's crazy. I'm jealous. Then again, this is how fast muchsync runs +(including a full database scan to detect changed messages and tags) +when there is no new mail: + +$ time ./muchsync -v +[notmuch] No new mail. +synchronizing muchsync database with Xapian... 0.038506 (+0.038506) +starting scan of Xapian database... 0.039069 (+0.000563) +opened Xapian... 0.040851 (+0.001782) +scanned message IDs... 0.137647 (+0.096796) +scanned tags... 0.170404 (+0.032757) +scanned directories in xapian... 0.172100 (+0.001696) +scanned filenames in xapian... 0.172376 (+0.000276) +adjusted link counts... 0.199461 (+0.027085) +finished synchronizing muchsync database with Xapian... 0.212965 (+0.013505) + +real 0m0.220s +user 0m0.173s +sys 0m0.023s + +David -- 2.26.2