Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 132B4431FB6 for ; Wed, 27 Jun 2012 02:18:09 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -1.098 X-Spam-Level: X-Spam-Status: No, score=-1.098 tagged_above=-999 required=5 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001, NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 21mRUppGuMIL for ; Wed, 27 Jun 2012 02:18:08 -0700 (PDT) Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 17016431FAF for ; Wed, 27 Jun 2012 02:18:08 -0700 (PDT) Received: from smtp.qmul.ac.uk ([138.37.6.40]) by mail2.qmul.ac.uk with esmtp (Exim 4.71) (envelope-from ) id 1SjoNr-0006z6-NY; Wed, 27 Jun 2012 10:18:04 +0100 Received: from 94-192-233-223.zone6.bethere.co.uk ([94.192.233.223] helo=localhost) by smtp.qmul.ac.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1SjoNr-0005hL-EI; Wed, 27 Jun 2012 10:17:59 +0100 From: Mark Walters To: Ethan Glasser-Camp , notmuch@notmuchmail.org Subject: Re: [RFC PATCH 00/14] modular mail stores based on URIs In-Reply-To: <1340656899-5644-1-git-send-email-ethan@betacantrips.com> References: <1340656899-5644-1-git-send-email-ethan@betacantrips.com> User-Agent: Notmuch/0.13.2+63~g548a9bf (http://notmuchmail.org) Emacs/23.4.1 (x86_64-pc-linux-gnu) Date: Wed, 27 Jun 2012 10:17:54 +0100 Message-ID: <877gutnmf1.fsf@qmul.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Sender-Host-Address: 94.192.233.223 X-QM-SPAM-Info: Sender has good ham record. :) X-QM-Body-MD5: e7aec02e93494fc172241b5418be4de1 (of first 20000 bytes) X-SpamAssassin-Score: -1.8 X-SpamAssassin-SpamBar: - X-SpamAssassin-Report: The QM spam filters have analysed this message to determine if it is spam. We require at least 5.0 points to mark a message as spam. This message scored -1.8 points. Summary of the scoring: * -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, * medium trust * [138.37.6.40 listed in list.dnswl.org] * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider * (markwalters1009[at]gmail.com) * -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay * domain * 0.5 AWL AWL: From: address is in the auto white-list X-QM-Scan-Virus: ClamAV says the message is clean X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jun 2012 09:18:09 -0000 Hi On Mon, 25 Jun 2012, Ethan Glasser-Camp wrote: > Hi guys, > > Sorry for dropping off the mailing list after I sent my last patch > series (http://notmuchmail.org/pipermail/notmuch/2012/009470.html). I > haven't had the time or a stable enough email address to really follow > notmuch development :) > > I signed onto #notmuch a week or two ago and asked what I would need > to do to get a feature like this one into mainline. j4ni told me that > he agreed with the feedback to my original patch series, and suggested > that I follow mjw1009's advice of having filenames encode all > information about mail storage transparently, and that this would > solve the problem with the original patch series of sprinkling mail > storage parameters all over the place. bremner suggested that he had > been thinking about how to support mbox or other multiple-message > archives, and also commented that he wasn't crazy about so much of the > API being in strings. > > Based on this advice, I decided to revise my approach to this > patchset, one that is based around the stated desire to work with mbox > formats. This approach, in contrast to the mailstore approach that > Michal Sojka proposed and I revised, encodes all mail access > information as URIs. These URIs are stored in Xapian the way that > relative paths are right now. Examples might be: > > maildir:///home/ethan/Mail/folder/cur/filename:2,S > mbox:///home/ethan/Mail/folder/file.mbox#byte-offset+lenght > couchdb://ethan:password@localhost:8080/some-doc-id First, thank you for resubmitting this: it is definitely a feature I would like to see in notmuch. And at a first glance I like the series (and I will try and review it over the next few days). > Personally, this isn't my favorite approach, for the following reasons: > > 1. Notmuch, at some point in its history, chose to store file paths > relative to a "mail database", with the intent that if this mail > database was moved, filenames would not change and everything would > Just Work (tm). The above scheme completely reverses this design > decision, and in general completely breaks this relocatability. I > don't see any easy way to handle this problem. This isn't just a > wishlist feature; at least two things in the test suite (caching of > corpus.mail, and the atomicity tests) rely on this behavior. Why can't the URI just store a relative path, at least for maildir:// and mbox:// ? It is purely internal to notmuch so it doesn't need to be very standard. > 2. Mail access information, i.e. open connections, etc. can only be > stored in variables global to the mailstore code, and cannot be stored > as private members of a mailstore object. This is more an aesthetic > concern than a functional one. > > Anyhow, the following (enormous) patch series implement this design. I > used uriparser as an external library to parse URIs. The API for this > library is a little idiosyncratic. uriparser supports parsing Unicode > URIs (strings of wchar_t), but I just used ASCII filenames because I > think that's what comes out of Xapian. Why use a library? Isn't it just a question of does the string contain // and, if so, splitting it? I guess that // is a nice separator as I think we can assume that a true path does not contain it (since a filename cannot contain /). > Patch 11 is borrowed directly from the last patch series. > > The last four or five patches add mbox support, including a few > tests. That part of the series is still very first-draft: I added a > new config option to specify URIs to scan, and ">From " lines still > need to be unescaped. However, we support scanning mbox files whether > messages have content-length or not. I have an idea that mbox byte-locations change when messages are marked as read (amongst other things). It might be worth saying that this initial implementation only works for unchanging mboxs (rather than the append only condition that you currently say). But I have not got as far as applying/testing the series yet. > I will try to receive feedback on this series more gratefully than the > last one. :) Just to say all of the above are genuine questions (not requests for you to rewrite stuff!) to try and understand the series. Best wishes Mark