From: David Mazieres Date: Sun, 23 Aug 2015 05:41:59 +0000 (+1700) Subject: Re: muchsync files renames X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=39c8524234865d5ae94cf3502263b9196a366d9b;p=notmuch-archives.git Re: muchsync files renames --- diff --git a/7d/1424312f61bc33f2654eeec1ea42b13939cf1d b/7d/1424312f61bc33f2654eeec1ea42b13939cf1d new file mode 100644 index 000000000..e26dc614c --- /dev/null +++ b/7d/1424312f61bc33f2654eeec1ea42b13939cf1d @@ -0,0 +1,122 @@ +Return-Path: + +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by arlo.cworth.org (Postfix) with ESMTP id 344406DE1B1B + for ; Sat, 22 Aug 2015 22:42:12 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at cworth.org +X-Spam-Flag: NO +X-Spam-Score: -2.208 +X-Spam-Level: +X-Spam-Status: No, score=-2.208 tagged_above=-999 required=5 tests=[AWL=0.643, + RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.55, SPF_PASS=-0.001] + autolearn=disabled +Received: from arlo.cworth.org ([127.0.0.1]) + by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id cd5ew9ssLEEv for ; + Sat, 22 Aug 2015 22:42:09 -0700 (PDT) +Received: from market.scs.stanford.edu (market.scs.stanford.edu [171.66.3.10]) + by arlo.cworth.org (Postfix) with ESMTPS id 6C6FD6DE18EF + for ; Sat, 22 Aug 2015 22:42:09 -0700 (PDT) +Received: from market.scs.stanford.edu (localhost.scs.stanford.edu + [127.0.0.1]) by market.scs.stanford.edu (8.14.7/8.14.7) with ESMTP id + t7N5g1Cj019244; Sat, 22 Aug 2015 22:42:01 -0700 (PDT) +Received: (from dm@localhost) + by market.scs.stanford.edu (8.14.7/8.14.7/Submit) id t7N5g0Dj012017; + Sat, 22 Aug 2015 22:42:00 -0700 (PDT) +X-Authentication-Warning: market.scs.stanford.edu: dm set sender to + return-tscnjiupa5jk2z8akbff4tt9se@ta.scs.stanford.edu using -f +From: David Mazieres +To: Amadeusz =?utf-8?B?xbtvxYJub3dza2k=?= , + notmuch@notmuchmail.org +Subject: Re: muchsync files renames +In-Reply-To: <878u93ujdo.fsf@freja.aidecoe.name> +References: <878u93ujdo.fsf@freja.aidecoe.name> +Date: Sat, 22 Aug 2015 22:41:59 -0700 +Message-ID: <876146o920.fsf@ta.scs.stanford.edu> +MIME-Version: 1.0 +Content-Type: text/plain; charset=utf-8 +Content-Transfer-Encoding: quoted-printable +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.18 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sun, 23 Aug 2015 05:42:12 -0000 + +Amadeusz =C5=BBo=C5=82nowski writes: + +> Hi, +> +> I am testing muchsync-2 and it looks to me that files names across +> machines are different. Moreover when syncing again after +> initialization it seems muchsync is working on something. I have +> canceled this and rerun muchsync. notmuch reported lots of files +> renames on server. What and why it happens? + +What muchsync specifically synchronizes for messages in the mapping: + + (directory, SHA-1-hash, link-count) + +So if a directory contains two copies of a file on one machine, it will +end up with two copies on the other machine. However, the file names +themselves are not the same, but rather are created in accordance with +the maildir spec. (Note SHA-1 wouldn't be my first choice of hash +function, but notmuch already uses this for messages with long message +IDs, so I figured I'd just be consistent with existing practice.) + +In terms of what muchsync is working on, you can run it with "-vvvv" on +both sides to get an idea, as in "muchsync -vvvv server -vvvv". Better +yet, you can just run it on one side with "muchsync -vvvv". You'll get +a lot of output, so maybe run it inside the script command to save the +output.maybe run it inside the script command to save the output. If +you have enabled maildir.synchronize_flags, it could be that notmuch is +initially renaming all of your files, in which case muchsync needs to +re-hash them to make sure they haven't changed. + +How did you cancel muchsync? If you send it a single SIGINT or SIGTERM, +it attempts to clean up after itself. However, upon multiple signals or +other signals, it immediately exits. Muchsync is conservative about +updating the database, to avoid missing tags or files that have been +changed. It always updates the notmuch database first, then its own +sqlite database with a version number. That means if you kill muchsync, +some number of files may get picked up as changed again even though +really they were just copied from a peer. + +To mitigate this problem, the muchsync client syncs the database every +10 seconds, so that in theory you should only get 10 seconds of extra +work from killing the client. However, the server does not sync +periodically, on the assumption that it is more likely to read an EOF +than get killed, although currently it doesn't appear to commit any +pending transactions to the sqlite database upon EOF, which may be an +oversight. + +So to summarize: + + * File names are not the same across machine, only file contents and + directory structure. + + * Give muchsync lots of "-v" options to see what it is doing. + + * Try to avoid killing muchsync. Doing so is safe, but likely to + generate extra work in the form of phantom renames or tag changes + that get synchronized even though they don't need to be. + + * Possibly the server should handle EOF more gracefully and commit any + pending transactions, or the client should periodically send a + commit command to the server. + +If you think something is wrong, I can help you figure it out, but I +need to know what maildir.synchronize_flags is set to on each replica, +what you mean by "canceled", and roughly what was happening when you +canceled (uploading or downloading). + +David