MIME-Version: 1.0
In-Reply-To: <87k3yrmahu.fsf@qmul.ac.uk>
References: <1340656899-5644-1-git-send-email-ethan@betacantrips.com>
	<877gutnmf1.fsf@qmul.ac.uk>
	<CAOJ+Ob0Kw0Kkhh9C27Xv9gvqtNowzQiNqrLAtvti7fL8NND2+w@mail.gmail.com>
	<87k3yrmahu.fsf@qmul.ac.uk>
Date: Sun, 1 Jul 2012 12:02:08 -0400
Message-ID:
 <CAOJ+Ob0MSOez2MvD2fCgF7t32kFPk4g2+xCud88QmBLt_b5pOA@mail.gmail.com>
Subject: Re: [RFC PATCH 00/14] modular mail stores based on URIs
From: Ethan <ethan.glasser.camp@gmail.com>
To: Mark Walters <markwalters1009@gmail.com>
Content-Type: multipart/alternative; boundary=bcaec5040a4ca93e8104c3c6cd04
Cc: notmuch@notmuchmail.org
Precedence: list

--bcaec5040a4ca93e8104c3c6cd04
Content-Type: text/plain; charset=ISO-8859-1

Thanks for going through it, I know there's a lot to go through..

On Thu, Jun 28, 2012 at 4:45 PM, Mark Walters <markwalters1009@gmail.com>wrote:

> I was thinking of just having one mail root and inside that there could
> be maildirs and mboxes. Everything would still be relative to the root.
>

I'm hesitant to have directories that contain maildirs and mboxes. It
should be possible to unambiguously distinguish between a maildir file and
an mbox file (mboxes always start with "From ", no colon) but it sounds
kind of fragile.

>  1. Are URIs the way to specify individual messages, despite bremner's
> >  concerns about too much of the API being strings? Is adding another
> library
> >  is the easiest way to parse URIs?
>
> In my opinion  the nice thing about using strings is that it does not
> require
> any changes to the Xapian database to store them. I think using URIs may
> not be best though as they seem to be annoying to parse (as filenames
> can contain the same characters) and you seem to need to work around the
> parser in some cases.
>

I think that's more the fault of the parser than of the URIs. If glib came
with a parser, that would be great. There aren't a lot of options for
pure-C URI parsing. Besides uriparser, there's also some code in the W3C
sample code library, but it looked like integrating it would be a pain so I
let it go.

I wonder if the following would be practical: use // as the field
> separator:
>
> e.g. mbox://filename//start_of_message+length
>
> I think 2 consecutive slashes // is about the only thing we can assume
> is not in the path or filename. Since it is not in the filename I think
> parsing should be trivial (thus avoiding the extra library).
>

Can you explain what you mean when you say that two consecutive slashes
can't appear in a URL? Ordinary filesystem paths can contain them, and so
can file: URLs. (I just looked up file:///home/ethan///////tmp and Firefox
handled that OK.) I've sometimes seen machine-generated filenames with
double slashes because that way you don't have to make sure the incoming
filename was correctly terminated before adding another level.


> Secondly, I would prefer to keep maildirs as just the bare file name: so
> the existence of // can be the signal that there is some other
> scheme. This is asymmetric, but is rather more backwardly compatible.
>

Based on your and Jani's reasoning, I did this. Revised patch series
follows.

Ethan

--bcaec5040a4ca93e8104c3c6cd04
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks for going through it, I know there&#39;s a lot to go through..<br><b=
r><div class=3D"gmail_quote">On Thu, Jun 28, 2012 at 4:45 PM, Mark Walters =
<span dir=3D"ltr">&lt;<a href=3D"mailto:markwalters1009@gmail.com" target=
=3D"_blank">markwalters1009@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">I was thinking of just having one mail root =
and inside that there could<br>
be maildirs and mboxes. Everything would still be relative to the root.<br>=
</blockquote><div><br>I&#39;m hesitant to have directories that contain mai=
ldirs and mboxes. It should be possible to unambiguously distinguish betwee=
n a maildir file and an mbox file (mboxes always start with &quot;From &quo=
t;, no colon) but it sounds kind of fragile.<br>


<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8=
ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
&gt; =A01. Are URIs the way to specify individual messages, despite bremner=
&#39;s<br>
&gt; =A0concerns about too much of the API being strings? Is adding another=
 library<br>
&gt; =A0is the easiest way to parse URIs?<br>

</div><br>In my opinion =A0the nice thing about using strings is that it do=
es not require<br>
any changes to the Xapian database to store them. I think using URIs may<br=
>
not be best though as they seem to be annoying to parse (as filenames<br>
can contain the same characters) and you seem to need to work around the<br=
>
parser in some cases.<br></blockquote><div><br>I think that&#39;s more the =
fault of the parser than of the URIs. If glib came with a parser, that woul=
d be great. There aren&#39;t a lot of options for pure-C URI parsing. Besid=
es uriparser, there&#39;s also some code in the W3C sample code library, bu=
t it looked like integrating it would be a pain so I let it go.<br>

<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8=
ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

I wonder if the following would be practical: use // as the field<br>
separator:<br>
<br>
e.g. mbox://filename//start_of_message+length<br>
<br>
I think 2 consecutive slashes // is about the only thing we can assume<br>
is not in the path or filename. Since it is not in the filename I think<br>
parsing should be trivial (thus avoiding the extra library).<br></blockquot=
e><div><br>Can you explain what you mean when you say that two consecutive =
slashes can&#39;t appear in a URL? Ordinary filesystem paths can contain th=
em, and so can file: URLs. (I just looked up file:///home/ethan///////tmp a=
nd Firefox handled that OK.) I&#39;ve sometimes seen  machine-generated fil=
enames with double slashes because that way you don&#39;t have to make sure=
 the incoming filename was correctly terminated before adding another level=
.<br>

=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8e=
x;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Secondly, I would prefer to keep maildirs as just the bare file name: so<br=
>
the existence of // can be the signal that there is some other<br>
scheme. This is asymmetric, but is rather more backwardly compatible.<br></=
blockquote><div><br>Based on your and Jani&#39;s reasoning, I did this. Rev=
ised patch series follows.<br>
<br></div><div>Ethan<br><br></div></div>

--bcaec5040a4ca93e8104c3c6cd04--