Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 9B667431FD4 for ; Sun, 1 Jul 2012 09:48:53 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -1.098 X-Spam-Level: X-Spam-Status: No, score=-1.098 tagged_above=-999 required=5 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001, NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id l0mhZzKend9t for ; Sun, 1 Jul 2012 09:48:51 -0700 (PDT) Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 1713D431FAF for ; Sun, 1 Jul 2012 09:48:51 -0700 (PDT) Received: from smtp.qmul.ac.uk ([138.37.6.40]) by mail2.qmul.ac.uk with esmtp (Exim 4.71) (envelope-from ) id 1SlNKC-0006Y7-NZ; Sun, 01 Jul 2012 17:48:46 +0100 Received: from 94-192-233-223.zone6.bethere.co.uk ([94.192.233.223] helo=localhost) by smtp.qmul.ac.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1SlNKC-0002Tw-DJ; Sun, 01 Jul 2012 17:48:40 +0100 From: Mark Walters To: Ethan Subject: Re: [RFC PATCH 00/14] modular mail stores based on URIs In-Reply-To: References: <1340656899-5644-1-git-send-email-ethan@betacantrips.com> <877gutnmf1.fsf@qmul.ac.uk> <87k3yrmahu.fsf@qmul.ac.uk> User-Agent: Notmuch/0.13.2+70~gb6a56e7 (http://notmuchmail.org) Emacs/23.4.1 (x86_64-pc-linux-gnu) Date: Sun, 01 Jul 2012 17:48:36 +0100 Message-ID: <87lij32zrv.fsf@qmul.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Sender-Host-Address: 94.192.233.223 X-QM-SPAM-Info: Sender has good ham record. :) X-QM-Body-MD5: aca816c7db7a5308d47f64aa70561ee3 (of first 20000 bytes) X-SpamAssassin-Score: -1.8 X-SpamAssassin-SpamBar: - X-SpamAssassin-Report: The QM spam filters have analysed this message to determine if it is spam. We require at least 5.0 points to mark a message as spam. This message scored -1.8 points. Summary of the scoring: * -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, * medium trust * [138.37.6.40 listed in list.dnswl.org] * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider * (markwalters1009[at]gmail.com) * -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay * domain * 0.5 AWL AWL: From: address is in the auto white-list X-QM-Scan-Virus: ClamAV says the message is clean Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Jul 2012 16:48:53 -0000 On Sun, 01 Jul 2012, Ethan wrote: > Thanks for going through it, I know there's a lot to go through.. > > On Thu, Jun 28, 2012 at 4:45 PM, Mark Walters wrote: > >> I was thinking of just having one mail root and inside that there could >> be maildirs and mboxes. Everything would still be relative to the root. >> > > I'm hesitant to have directories that contain maildirs and mboxes. It > should be possible to unambiguously distinguish between a maildir file and > an mbox file (mboxes always start with "From ", no colon) but it sounds > kind of fragile. Well I was thinking you would still need to add specific sub-directories of db_path that might contain mboxes. >> 1. Are URIs the way to specify individual messages, despite bremner's >> > concerns about too much of the API being strings? Is adding another >> library >> > is the easiest way to parse URIs? >> >> In my opinion the nice thing about using strings is that it does not >> require >> any changes to the Xapian database to store them. I think using URIs may >> not be best though as they seem to be annoying to parse (as filenames >> can contain the same characters) and you seem to need to work around the >> parser in some cases. >> > > I think that's more the fault of the parser than of the URIs. If glib came > with a parser, that would be great. There aren't a lot of options for > pure-C URI parsing. Besides uriparser, there's also some code in the W3C > sample code library, but it looked like integrating it would be a pain so I > let it go. > > I wonder if the following would be practical: use // as the field >> separator: >> >> e.g. mbox://filename//start_of_message+length >> >> I think 2 consecutive slashes // is about the only thing we can assume >> is not in the path or filename. Since it is not in the filename I think >> parsing should be trivial (thus avoiding the extra library). >> > > Can you explain what you mean when you say that two consecutive slashes > can't appear in a URL? Ordinary filesystem paths can contain them, and so > can file: URLs. (I just looked up file:///home/ethan///////tmp and Firefox > handled that OK.) I've sometimes seen machine-generated filenames with > double slashes because that way you don't have to make sure the incoming > filename was correctly terminated before adding another level. Nothing outside notmuch (i.e. other applications creating arbitrary filenames etc) can make notmuch store a // as part of a path so if we ever do store them in the database it's our own fault. In particular notmuch can avoid them easily in that they cannot occur in a filename. >> Secondly, I would prefer to keep maildirs as just the bare file name: so >> the existence of // can be the signal that there is some other >> scheme. This is asymmetric, but is rather more backwardly compatible. >> > > Based on your and Jani's reasoning, I did this. Revised patch series > follows. I will try and look at that now. Best wishes Mark