References: <20100111221909.GA30299@lapse.rw.madduck.net>
	<F3C2A2F4-515E-4919-9163-6958C2FAA2C5@indev.ca>
	<20100113012404.GA570@lapse.rw.madduck.net>
In-Reply-To: <20100113012404.GA570@lapse.rw.madduck.net>
Mime-Version: 1.0 (Apple Message framework v1077)
Content-Type: text/plain; charset=windows-1252
Message-Id: <BF51FFFB-41D1-4105-84AA-43C3FEB8365D@indev.ca>
Content-Transfer-Encoding: quoted-printable
From: Scott Morrison <smorr@indev.ca>
Date: Wed, 13 Jan 2010 00:39:14 -0500
To: mailtags discussion list <mailtags@lists.madduck.net>
Cc: notmuch discussion list <notmuch@notmuchmail.org>
Subject: Re: [notmuch] Idea for storing tags
Precedence: list


On 2010-01-12, at 8:24 PM, martin f krafft wrote:

> also sprach Scott Morrison <smorr@indev.ca> [2010.01.12.1711 +1300]:
>> 1.  synchronization of tag data with emails -- if they are in
>> a subfolder then it presents the issue of maintaining this
>> subfolder when managing emails (moving, deleting, duplicating etc)
>> and any .tag folder unaware clients are likely cause an breakage
>> in tagdata/message association.  One way of doing this is to have
>> a global .tag folder.
>=20
> A global .tag folder indexed by e.g. message ID, as you state later,
> would probably allow for this. Or a file-per-tag design. We'd have
> to think carefully about pros and cons for each.
>=20
> When thinking about this, I always have to remind myself that we are
> targetting this at a design that has indexed search. If that weren't
> the case, searches would be incredibly expensive.
>=20
> Maybe a better approach would be content addressing (see below).


Content hashing -- good Idea (& not something that has hit me before) -- =
better than Message-Id as I believe there are still some MUA /MTAs that =
allow messages without message ids.  The only potential issue with this =
is that it is critical then to preserve the message source against =
encoding changes though that shouldn't be too hard to avoid.

>=20
>> 2. what happens if that message is archived or moved to an
>> exclusively local cache -- eg. Mail.app on OS X can easily move
>> IMAP messages to a folder resident on the computers computers?
>=20
> Well, if the target can store tags, then ideally the MUA should know
> how to transfer them along.
>=20
> Maybe the right thing to do would be to use extended attributes
> (which are stored in the inode!), even if they may not be
> universally supported yet. If our solution scales, then this might
> lead to a significant increase in xattr adoption.
The problem with anything that is not universally supported is that for =
a package that is to appeal to a wide userbase, most don't know and =
don't care about the particulars of this IMAP server vs that IMAP =
server.  all they know it that for some reason it doesn't work with =
account X -- which leads to support head aches.

>=20
>> 3. what happens with duplicates of emails -- I would assume that
>> the message id would be the key to match the tag data to the
>> message.  In this system a duplicate of a message could not have
>> a different set of tags from the original (not that this would
>> necessarily be desirable.)
>=20
> Duplicates need folders, and tags and folders are somewhat at odds
> with each other. I mean, you can represent a folder hierarchy with
> tags (and more), and if you have tags and folders, you are
> potentially introducing a level of confusion/ambiguity that we don't
> want in the first place. Maybe the ideal solution doesn't need
> folders anymore (and IMAP-compatible (Maildir) subfolders have
> always been a hack anyway).
>=20
> There are also two types of duplicates: copies and links. The former
> can diverge, the latter can't. I don't really see a reason for
> either. It's not like you need to copy a mail before you edit it,
> and I don't see a real reason for linking, assuming that the primary
> means of browsing will be tag-searches anyway.
>=20
> Duplicates always make me think of content addressing, like Git's
> object cache. We could store the content hash of a message in its
> filename, and also use the hash to index into the tag database.
> I think that would be much cleaner than message IDs, and would make
> handling true duplicates (links) much easier, while copies (diverged
> ex-duplicates) would also be taken care of automatically.

I agree that conceptually duplicates should be buried but end users do =
have "peculiar" organization systems.

>=20
> -snip-

>> The performance issue is very real -- because it means that
>> somehow messages have to rewritten to the IMAP server -- IMAP
>> doesn't have a mechanism AFAIK for updates.
>=20
> Not even UIDPLUS?
> http://wiki.dovecot.org/FeatUIDPLUS
=46rom my reading, uidplus doesn't allow a delta modification of a =
message on a server -- just to write a portion of a message back -- you =
still have to write the whole thing back and that can mean real =
bandwidth issues for some messages.

>=20
>> Additionally, IMAP doesn't have a mechanism for simply replacing
>> one message data with another -- a new message must be written and
>> the old message must be deleted and the message IMAP UID will
>> change, and the client will have to deal with this especially if
>> it is cache the messages.
>=20
> Yes, I am experiencing this pain regularly, since I currently use
> a lot of message rewriting as part of my workflow =97 one of the
> reasons why I'd like to find an alternative.
>=20
>> Also GMAIL IMAP is an issue-
>=20
> Yeah, I bet. Is there anyone who doesn't think that that's Google's
> problem, not ours, though?
>=20
Call it Googles problem as you like -- but when I have a product that =
doesn't work with GMAIL IMAP there are a lot of potential users that =
don't care about server peculiarities and rather just have it work.