Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 5AD0B431FD0 for ; Tue, 20 Dec 2011 07:03:57 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id r-2MZINkMl77 for ; Tue, 20 Dec 2011 07:03:56 -0800 (PST) Received: from dmz-mailsec-scanner-5.mit.edu (DMZ-MAILSEC-SCANNER-5.MIT.EDU [18.7.68.34]) by olra.theworths.org (Postfix) with ESMTP id 99C8A431FB6 for ; Tue, 20 Dec 2011 07:03:56 -0800 (PST) X-AuditID: 12074422-b7fd66d0000008f9-fd-4ef0a3db0c27 Received: from mailhub-auth-2.mit.edu ( [18.7.62.36]) by dmz-mailsec-scanner-5.mit.edu (Symantec Messaging Gateway) with SMTP id C8.D7.02297.BD3A0FE4; Tue, 20 Dec 2011 10:03:56 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-2.mit.edu (8.13.8/8.9.2) with ESMTP id pBKF3swd018335; Tue, 20 Dec 2011 10:03:55 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id pBKF3q2n025101 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Tue, 20 Dec 2011 10:03:53 -0500 (EST) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77) (envelope-from ) id 1Rd1FX-0004bN-Vv; Tue, 20 Dec 2011 10:05:04 -0500 Date: Tue, 20 Dec 2011 10:05:03 -0500 From: Austin Clements To: David Edmondson Subject: Re: [PATCH 0/5] Store message modification times in the DB Message-ID: <20111220150503.GE10376@mit.edu> References: <1323796305-28789-1-git-send-email-schnouki@schnouki.net> <20111219194821.GA10376@mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFuphleLIzCtJLcpLzFFi42IRYrdT0b2z+IOfwYqdTBb77mxhsrh+cyaz xb5+fwdmj13P/zJ5PFt1i9ljyqy57AHMUVw2Kak5mWWpRfp2CVwZ8ye9ZCr4oVjx7j97A+N1 qS5GTg4JAROJy5uXM0LYYhIX7q1n62Lk4hAS2McoMX/5T3YIZwOjxIM7O5khnJNMEoduLWIH aRESWMIosXJHEYjNIqAq0TDjBROIzSagIbFtP8RYEQFFif/fVoDVMws4SKw82QtmCws4S2z+ uZMVxOYV0JHovPKaCWLBGUaJOds/M0MkBCVOznzCAtGsJXHj30ugIg4gW1pi+T8OkDCngI3E yqeTwfaKCqhITDm5jW0Co9AsJN2zkHTPQuhewMi8ilE2JbdKNzcxM6c4NVm3ODkxLy+1SNdU LzezRC81pXQTIzjQXZR2MP48qHSIUYCDUYmHd2XTez8h1sSy4srcQ4ySHExKorwnF37wE+JL yk+pzEgszogvKs1JLT7EKMHBrCTCe6wFKMebklhZlVqUD5OS5mBREudV13rnJySQnliSmp2a WpBaBJOV4eBQkuCVAUa0kGBRanpqRVpmTglCmomDE2Q4D9Bwf5Aa3uKCxNzizHSI/ClGRSlx XieQhABIIqM0D64XloheMYoDvSLM6wNSxQNMYnDdr4AGMwEN3uYMNrgkESEl1cDIO79h5fP3 O+/9lcuTZJVq71GfNr/PuqBswV+HM6qX7rSfa1zvdvx8wqmn/at+r1nNcT3Ny2zWW8/25fOW OxpIb3iVI2r6TqdToPBRhWuKa9CDSyWK/YvFNk768vz6XBfO7+1zzi/caaY3h3Gny7kH0kdr jYJ7drK1tBzY9eod327RLwudz/gkKrEUZyQaajEXFScCAH+5yf8fAwAA Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Dec 2011 15:03:57 -0000 Quoth David Edmondson on Dec 20 at 8:32 am: > > == Two-way "merge" from host R to host L == > > > > Per-host state: > > - last_mtime: Map from remote hosts to last sync mtime > > With the proposed changes it seems that the state required on each host > would live within the Xapian database (to be extracted with 'dump'). It certainly could. I haven't thought about how any of this would integrate with dump, or if it necessarily should. A related question is how bootstrap should work. For example, if you add another host, what's the best way to bring it up to speed without, say, overwriting your tags everywhere with your initial tags? In general, when a new message arrives, how do you get the hosts to agree on its tags and what happens if one host tags it before another host sees it? > > new_mtime = last_mtime[R] > > For msgid, mtime, tags in messages on host R with mtime >= last_mtime[R]: > > If mtime > local mtime of msgid: > > Set local tags of msgid to tags > > new_mtime = max(new_mtime, mtime) > > last_mtime[R] = new_mtime > > > > This has the advantage of keeping very little state, but the > > synchronization is also quite primitive. If two hosts change a > > message's tags in different ways between synchronizations, the more > > recent of the two will override the full set of tags on that message. > > This does not strictly require tombstones, though if you make a tag > > change and then delete the message before a sync, the tag change will > > be lost without some record of that state. > > Does this matter? If the tag on a deleted message is changed, does > anyone care? That depends on what sort of synchronization model you're expecting. If you're expecting git-style synchronization where all that matters is the state and not the order things happened in, then this is exactly what you'd expect. If you're expecting something more nuanced that knows about the order you did things in across hosts between synchronizations (which I think can only lead to more unintuitive corner-cases, but some people seem to expect), then this could be surprising. > > Also, this obviously depends heavily on synchronized clocks. > > > > > > == Three-way merge from host R to host L == > > > > Per-host state: > > - last_mtime: Map from remote hosts to last sync mtime > > - last_sync: Map from remote hosts to the tag database as of the last sync > > Any ideas where this state might be kept? It could also be stored in Xapian (in user keys or as additional message metadata). That would certainly be simplest and would avoid hairy atomicity issues. OTOH, it's not the end of the world if last_sync doesn't get updated atomically, especially if we can at least guarantee last_sync is fully updated and on disk before we update last_mtime. > > new_mtime = last_mtime[R] > > for msgid, mtime, r_tags in messages on host R with mtime >= last_mtime[R]: > > my_tags = local tags of msgid > > last_tags = last_sync[R][msgid] > > for each tag that differs between my_tags and r_tags: > > if tag is in last_tags: remove tag locally > > else: add tag locally > > last_sync[R][msgid] = tags > > new_mtime = max(new_mtime, mtime) > > Delete stale messages from last_sync[R] (using tombstones or something) > > last_mtime[R] = new_mtime > > > > This protocol requires significantly more state, but can also > > reconstruct per-tag changes. Conflict resolution is equivalent to > > what git would do and is based solely on the current local and remote > > state and the common ancestor state. This can lead to unintuitive > > results if a tag on a message has gone through multiple changes on > > both hosts since the last sync (though, I argue, there are no > > intuitive results in such situations). Tombstones are only required > > to garbage collect sync state (and other techniques could be used for > > that). This also does not depend on time synchronization (though, > > like any mtime solution, it does depend on mtime monotonicity). The > > algorithm would work equally well with sequence numbers. > > > > I tried coming up with a third algorithm that used mtimes to resolve > > tagging conflicts, but without per-tag mtimes it degenerated into the > > first algorithm. > > dme.