1 Return-Path: <Vladimir.Marek@Oracle.COM>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id BE42C431FC2
\r
6 for <notmuch@notmuchmail.org>; Tue, 14 Aug 2012 09:52:08 -0700 (PDT)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=-4.999 tagged_above=-999 required=5
\r
12 tests=[RCVD_IN_DNSWL_HI=-5, UNPARSEABLE_RELAY=0.001]
\r
14 Received: from olra.theworths.org ([127.0.0.1])
\r
15 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
16 with ESMTP id 0veuzWsXCWt6 for <notmuch@notmuchmail.org>;
\r
17 Tue, 14 Aug 2012 09:52:08 -0700 (PDT)
\r
18 Received: from rcsinet15.oracle.com (rcsinet15.oracle.com [148.87.113.117])
\r
19 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
\r
20 (No client certificate requested)
\r
21 by olra.theworths.org (Postfix) with ESMTPS id 2F756431FAE
\r
22 for <notmuch@notmuchmail.org>; Tue, 14 Aug 2012 09:52:08 -0700 (PDT)
\r
23 Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238])
\r
24 by rcsinet15.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with
\r
25 ESMTP id q7EGq4T4021055
\r
26 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
\r
27 Tue, 14 Aug 2012 16:52:05 GMT
\r
28 Received: from acsmt357.oracle.com (acsmt357.oracle.com [141.146.40.157])
\r
29 by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id
\r
31 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
\r
32 Tue, 14 Aug 2012 16:52:04 GMT
\r
33 Received: from abhmt104.oracle.com (abhmt104.oracle.com [141.146.116.56])
\r
34 by acsmt357.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id
\r
35 q7EGq3jt001878; Tue, 14 Aug 2012 11:52:03 -0500
\r
36 Received: from pub.cz.oracle.com (/10.163.20.32)
\r
37 by default (Oracle Beehive Gateway v4.0)
\r
38 with ESMTP ; Tue, 14 Aug 2012 09:52:03 -0700
\r
39 Date: Tue, 14 Aug 2012 18:50:44 +0200
\r
40 From: Vladimir Marek <Vladimir.Marek@Oracle.COM>
\r
41 To: Ciprian Dorin Craciun <ciprian.craciun@gmail.com>
\r
42 Subject: Re: Alternative (raw) message store (i.e. instead of maildir)
\r
43 Message-ID: <20120814165044.GP28321@pub.cz.oracle.com>
\r
44 Mail-Followup-To: Ciprian Dorin Craciun <ciprian.craciun@gmail.com>,
\r
45 Stewart Smith <stewart@flamingspork.com>, notmuch@notmuchmail.org
\r
47 <CA+Tk8fwq2thNeKHgfG-EX0hgR7uyqrSce0ZMOhEJBsz1RVtRqg@mail.gmail.com>
\r
48 <20120811094635.GY28321@pub.cz.oracle.com> <874no613ms.fsf@flamingspork.com>
\r
49 <20120814160442.GO28321@pub.cz.oracle.com>
\r
50 <CA+Tk8fwVwWewTS-AVaaapQpLNU6a698acp-_ZmnktJ5ynRrx1A@mail.gmail.com>
\r
52 Content-Type: text/plain; charset=utf-8
\r
53 Content-Disposition: inline
\r
55 <CA+Tk8fwVwWewTS-AVaaapQpLNU6a698acp-_ZmnktJ5ynRrx1A@mail.gmail.com>
\r
56 User-Agent: Mutt/1.5.21 (2010-09-15)
\r
57 X-Source-IP: acsinet22.oracle.com [141.146.126.238]
\r
58 Cc: notmuch@notmuchmail.org
\r
59 X-BeenThere: notmuch@notmuchmail.org
\r
60 X-Mailman-Version: 2.1.13
\r
62 List-Id: "Use and development of the notmuch mail system."
\r
63 <notmuch.notmuchmail.org>
\r
64 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
65 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
66 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
67 List-Post: <mailto:notmuch@notmuchmail.org>
\r
68 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
69 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
70 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
71 X-List-Received-Date: Tue, 14 Aug 2012 16:52:08 -0000
\r
73 > >> > - fuse zip stores all changes in memory until unmounted
\r
74 > >> > - fuse zip (and libzip for that matter) creates new temporary file when
\r
75 > >> > updating archive, which takes considerable time when the archive is
\r
78 > >> This isn't much of a hastle if you have maildir per time period and
\r
79 > >> archive off. Maybe if you sync flags it may be...
\r
81 > > That might be interesting solution, maildir per time period.
\r
84 > Although using a zip file through FUSE as a maildir store is not
\r
85 > much better in my opinion.
\r
87 > This is because it still doesn't solve the syscall overhead. For
\r
88 > example just going through the list of files to find those that
\r
89 > changed requires the following syscalls:
\r
90 > * reading the next directory entry (which is amortized as it reads
\r
91 > them in a batch, but the batch size is limited, should we say 1
\r
92 > syscall per 10 files?);
\r
93 > * stat-ing the file;
\r
95 > Now by adding FUSE we add an extra context switch for each syscall...
\r
97 > Although this issue would be problematic only for reindexing, but still...
\r
99 That's a price I would be willing to pay to have single file instead of
\r
106 > > fuse zip caches all the data until unmounted. So even with just reading
\r
107 > > it keeps growing (I hope I'm not accusing fuse zip here, but this is my
\r
108 > > understanding form the code). This could be simply alleviated by having
\r
109 > > it periodically unmounted and mounted again (perhaps from cron).
\r
111 > I think there is an option for FUSE mount to specify if the data
\r
112 > should be cached by the kernel or not, as such this shouldn't be a
\r
113 > problem for FUSE itself, except if the Zip FUSE handler does some
\r
116 To my understanding it's the handler itself.
\r
121 > >> > Of course this solution would have some disadvantages too, but for me
\r
122 > >> > the advantages would win. At the moment I'm not sure if I want to
\r
123 > >> > continue working on that. Maybe if there would be more interested guys
\r
125 > >> I'm *really* tempted to investigate making this work for archived
\r
126 > >> mail. Of course, the list of mounted file systems could get insane
\r
127 > >> depending on granularity I guess...
\r
129 > > Well, if your granularity will be one archive per year of mail, it
\r
130 > > should not be that bad ...
\r
133 > On the other hand I strongly sustain having a more optimized
\r
134 > backend for emails, especially for such cases. For example a
\r
135 > BerkeleyDB would perfectly fit such a use case, especially if we store
\r
136 > the body and the headers in separate databases.
\r
138 > Just a small experiment, below are the R `summary(emails)` of the
\r
139 > sizes of my 700k emails:
\r
141 > Min. 1st Qu. Median Mean 3rd Qu. Max.
\r
142 > 8 4364 5374 11510 7042 31090000
\r
145 > As seen 75% of the emails are below 7k, and this without any compression...
\r
147 > Moreover we could organize the keys so that in a B-Tree structure
\r
148 > the emails in the same thread are closer together...
\r
150 Now I'm not sure if you talk about some berkeley-db fuse filesystem or
\r
151 direct support in notmuch. I don't have enough cycles to modify notmuch,
\r
152 so I started to look at simpler (codewise) solution ...
\r
154 To summarize, what I personally want from the mail storage
\r
156 - ability to read and write mails
\r
157 - should work with mutt (or mutt-kz)
\r
158 - simple backup to windows drive (files can't contain double colon ':')
\r