Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id C544F431FBC for ; Tue, 12 Jan 2010 21:39:17 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0.295 X-Spam-Level: X-Spam-Status: No, score=0.295 tagged_above=-999 required=5 tests=[AWL=2.894, BAYES_00=-2.599] autolearn=ham Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Bt5MCGplHliv for ; Tue, 12 Jan 2010 21:39:16 -0800 (PST) X-Greylist: delayed 91652 seconds by postgrey-1.32 at olra; Tue, 12 Jan 2010 21:39:16 PST Received: from homiemail-a5.g.dreamhost.com (caiajhbdccac.dreamhost.com [208.97.132.202]) by olra.theworths.org (Postfix) with ESMTP id C7EB2431FAE for ; Tue, 12 Jan 2010 21:39:16 -0800 (PST) Received: from [192.168.2.199] (modemcable049.81-81-70.mc.videotron.ca [70.81.81.49]) by homiemail-a5.g.dreamhost.com (Postfix) with ESMTP id 8AA24BC9D5; Tue, 12 Jan 2010 21:39:15 -0800 (PST) References: <20100111221909.GA30299@lapse.rw.madduck.net> <20100113012404.GA570@lapse.rw.madduck.net> In-Reply-To: <20100113012404.GA570@lapse.rw.madduck.net> Mime-Version: 1.0 (Apple Message framework v1077) Content-Type: text/plain; charset=windows-1252 Message-Id: Content-Transfer-Encoding: quoted-printable From: Scott Morrison Date: Wed, 13 Jan 2010 00:39:14 -0500 To: mailtags discussion list X-Mailer: Apple Mail (2.1077) Cc: notmuch discussion list Subject: Re: [notmuch] Idea for storing tags X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jan 2010 05:39:17 -0000 On 2010-01-12, at 8:24 PM, martin f krafft wrote: > also sprach Scott Morrison [2010.01.12.1711 +1300]: >> 1. synchronization of tag data with emails -- if they are in >> a subfolder then it presents the issue of maintaining this >> subfolder when managing emails (moving, deleting, duplicating etc) >> and any .tag folder unaware clients are likely cause an breakage >> in tagdata/message association. One way of doing this is to have >> a global .tag folder. >=20 > A global .tag folder indexed by e.g. message ID, as you state later, > would probably allow for this. Or a file-per-tag design. We'd have > to think carefully about pros and cons for each. >=20 > When thinking about this, I always have to remind myself that we are > targetting this at a design that has indexed search. If that weren't > the case, searches would be incredibly expensive. >=20 > Maybe a better approach would be content addressing (see below). Content hashing -- good Idea (& not something that has hit me before) -- = better than Message-Id as I believe there are still some MUA /MTAs that = allow messages without message ids. The only potential issue with this = is that it is critical then to preserve the message source against = encoding changes though that shouldn't be too hard to avoid. >=20 >> 2. what happens if that message is archived or moved to an >> exclusively local cache -- eg. Mail.app on OS X can easily move >> IMAP messages to a folder resident on the computers computers? >=20 > Well, if the target can store tags, then ideally the MUA should know > how to transfer them along. >=20 > Maybe the right thing to do would be to use extended attributes > (which are stored in the inode!), even if they may not be > universally supported yet. If our solution scales, then this might > lead to a significant increase in xattr adoption. The problem with anything that is not universally supported is that for = a package that is to appeal to a wide userbase, most don't know and = don't care about the particulars of this IMAP server vs that IMAP = server. all they know it that for some reason it doesn't work with = account X -- which leads to support head aches. >=20 >> 3. what happens with duplicates of emails -- I would assume that >> the message id would be the key to match the tag data to the >> message. In this system a duplicate of a message could not have >> a different set of tags from the original (not that this would >> necessarily be desirable.) >=20 > Duplicates need folders, and tags and folders are somewhat at odds > with each other. I mean, you can represent a folder hierarchy with > tags (and more), and if you have tags and folders, you are > potentially introducing a level of confusion/ambiguity that we don't > want in the first place. Maybe the ideal solution doesn't need > folders anymore (and IMAP-compatible (Maildir) subfolders have > always been a hack anyway). >=20 > There are also two types of duplicates: copies and links. The former > can diverge, the latter can't. I don't really see a reason for > either. It's not like you need to copy a mail before you edit it, > and I don't see a real reason for linking, assuming that the primary > means of browsing will be tag-searches anyway. >=20 > Duplicates always make me think of content addressing, like Git's > object cache. We could store the content hash of a message in its > filename, and also use the hash to index into the tag database. > I think that would be much cleaner than message IDs, and would make > handling true duplicates (links) much easier, while copies (diverged > ex-duplicates) would also be taken care of automatically. I agree that conceptually duplicates should be buried but end users do = have "peculiar" organization systems. >=20 > -snip- >> The performance issue is very real -- because it means that >> somehow messages have to rewritten to the IMAP server -- IMAP >> doesn't have a mechanism AFAIK for updates. >=20 > Not even UIDPLUS? > http://wiki.dovecot.org/FeatUIDPLUS =46rom my reading, uidplus doesn't allow a delta modification of a = message on a server -- just to write a portion of a message back -- you = still have to write the whole thing back and that can mean real = bandwidth issues for some messages. >=20 >> Additionally, IMAP doesn't have a mechanism for simply replacing >> one message data with another -- a new message must be written and >> the old message must be deleted and the message IMAP UID will >> change, and the client will have to deal with this especially if >> it is cache the messages. >=20 > Yes, I am experiencing this pain regularly, since I currently use > a lot of message rewriting as part of my workflow =97 one of the > reasons why I'd like to find an alternative. >=20 >> Also GMAIL IMAP is an issue- >=20 > Yeah, I bet. Is there anyone who doesn't think that that's Google's > problem, not ours, though? >=20 Call it Googles problem as you like -- but when I have a product that = doesn't work with GMAIL IMAP there are a lot of potential users that = don't care about server peculiarities and rather just have it work.