Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 3E88E431FC2 for ; Tue, 14 Aug 2012 10:05:13 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.799 X-Spam-Level: X-Spam-Status: No, score=-0.799 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uhSEJWkQGR-0 for ; Tue, 14 Aug 2012 10:05:12 -0700 (PDT) Received: from mail-wi0-f173.google.com (mail-wi0-f173.google.com [209.85.212.173]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 6B921431FAE for ; Tue, 14 Aug 2012 10:05:12 -0700 (PDT) Received: by wibhm6 with SMTP id hm6so3900887wib.2 for ; Tue, 14 Aug 2012 10:05:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=mD/ocqYSMyihREvcSf2OeRlzDTXe6KGu6E71JFKcjOU=; b=bEHA28V/kNaVHIEBjP5F7AZRnTpzOnOdYcRkvpXsylOOpVznfBlc2wGFfMACFX7wD5 zp95auKVotrL6rcm1+D7LBk7mh2FrJwLYb9DK4P4Rw4aULP8vgS8fZAbyg8DZ16UKl5G U7b7PvruXWSUf4vHXCx4bMZXaYRMezNEqFQxY9P2rIBlGkPMteVnT55etnBI3UsWyi5s Fy31ZoEWAclADghSwS00xyzBZCD2twwRe0rtzDLQi96aQ2PINKhZkGOqCxWY/qIB1g4e 3dg66huhl8V5xeR8c83sjW+T5FzxA532a0RdfZYc05Df8uzjf6h9MYgX07pu/yH74JQL eAjg== MIME-Version: 1.0 Received: by 10.180.74.33 with SMTP id q1mr29483925wiv.4.1344963911172; Tue, 14 Aug 2012 10:05:11 -0700 (PDT) Received: by 10.180.104.196 with HTTP; Tue, 14 Aug 2012 10:05:11 -0700 (PDT) In-Reply-To: <20120814165044.GP28321@pub.cz.oracle.com> References: <20120811094635.GY28321@pub.cz.oracle.com> <874no613ms.fsf@flamingspork.com> <20120814160442.GO28321@pub.cz.oracle.com> <20120814165044.GP28321@pub.cz.oracle.com> Date: Tue, 14 Aug 2012 20:05:11 +0300 Message-ID: Subject: Re: Alternative (raw) message store (i.e. instead of maildir) From: Ciprian Dorin Craciun To: Vladimir.Marek@oracle.com, Stewart Smith , notmuch@notmuchmail.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Aug 2012 17:05:13 -0000 On Tue, Aug 14, 2012 at 7:50 PM, Vladimir Marek wrote: >> On the other hand I strongly sustain having a more optimized >> backend for emails, especially for such cases. For example a >> BerkeleyDB would perfectly fit such a use case, especially if we store >> the body and the headers in separate databases. >> >> Just a small experiment, below are the R `summary(emails)` of the >> sizes of my 700k emails: >> ~~~~ >> Min. 1st Qu. Median Mean 3rd Qu. Max. >> 8 4364 5374 11510 7042 31090000 >> ~~~~ >> >> As seen 75% of the emails are below 7k, and this without any compression... >> >> Moreover we could organize the keys so that in a B-Tree structure >> the emails in the same thread are closer together... > > Now I'm not sure if you talk about some berkeley-db fuse filesystem or > direct support in notmuch. No tricks. :) I proposed -- better said queried if possible or at least wanted -- to have an internal interface (SPI) that any mail store would have to implement in order to be indexed and used by notmuch. I guess the interface would be quite lightweight, and would need just the following: * open store; * create a cursor iterating through all the emails, yielding only the keys; * read the envelope (as a byte blob) of a particular key; (used only for displaying thread lists, etc.;) * read the body (as a byte blob) of a particular key; * maybe create a cursor iterating over all those emails that have changed since a particular timestamp; > I don't have enough cycles to modify notmuch, > so I started to look at simpler (codewise) solution ... > > To summarize, what I personally want from the mail storage We need to make a distinction between current storage (like maildir) and archival storage (like the Zip or my proposal). > - ability to read and write mails It could be done through a small CLI over the proposed API. > - should work with mutt (or mutt-kz) This would eliminate any proposal not involving a FUSE wrapper... > - simple backup to windows drive (files can't contain double colon ':') This could be done via a dump like facility. (BerkeleyDB supports this natively through a tool.)