From: Tomi Ollila Date: Mon, 14 Mar 2016 07:23:21 +0000 (+0200) Subject: Re: [PATCH] Don't bother checking for mbox files X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=459c151b65278cf92eb072fdb1db2c305ebc5bb6;p=notmuch-archives.git Re: [PATCH] Don't bother checking for mbox files --- diff --git a/6f/de2f35119394ca594c29e7ec908d8d39eff207 b/6f/de2f35119394ca594c29e7ec908d8d39eff207 new file mode 100644 index 000000000..12c6b8877 --- /dev/null +++ b/6f/de2f35119394ca594c29e7ec908d8d39eff207 @@ -0,0 +1,134 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by arlo.cworth.org (Postfix) with ESMTP id 4EB256DE1862 + for ; Mon, 14 Mar 2016 00:23:28 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at cworth.org +X-Spam-Flag: NO +X-Spam-Score: 0.634 +X-Spam-Level: +X-Spam-Status: No, score=0.634 tagged_above=-999 required=5 tests=[AWL=-0.018, + SPF_NEUTRAL=0.652] autolearn=disabled +Received: from arlo.cworth.org ([127.0.0.1]) + by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id ljyua7JZvy8y for ; + Mon, 14 Mar 2016 00:23:25 -0700 (PDT) +Received: from guru.guru-group.fi (guru.guru-group.fi [46.183.73.34]) + by arlo.cworth.org (Postfix) with ESMTP id E3CA66DE185F + for ; Mon, 14 Mar 2016 00:23:24 -0700 (PDT) +Received: from guru.guru-group.fi (localhost [IPv6:::1]) + by guru.guru-group.fi (Postfix) with ESMTP id 97075100063; + Mon, 14 Mar 2016 09:23:21 +0200 (EET) +From: Tomi Ollila +To: Jani Nikula , Edward Betts , + notmuch@notmuchmail.org +Subject: Re: [PATCH] Don't bother checking for mbox files +In-Reply-To: <87mvq2pmqe.fsf@nikula.org> +References: <86io0v9oum.fsf@hiro.keithp.com> + <20160313105742.GA9173@4angle.com> <87mvq2pmqe.fsf@nikula.org> +User-Agent: Notmuch/0.21+81~g4743a61 (http://notmuchmail.org) Emacs/24.3.1 + (x86_64-unknown-linux-gnu) +X-Face: HhBM'cA~ +MIME-Version: 1.0 +Content-Type: text/plain +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.20 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Mon, 14 Mar 2016 07:23:28 -0000 + +On Sun, Mar 13 2016, Jani Nikula wrote: + +> [ text/plain ] +> On Sun, 13 Mar 2016, Edward Betts wrote: +>> Keith Packard wrote: +>>> Postfix adds mbox-style From lines when used in combination with +>>> maildrop or .forward files. If they have another line starting with +>>> 'From ' in them, notmuch complains about them not being mail files. +>>> +>>> If we assume the user hasn't screwed up and misconfigured their mail +>>> system, then we can safely ignore whether the file started with an +>>> mbox header and just parse it as a single-message file. +>> +>> I think it is fine to go ahead with this change. At the same time the +>> behaviour of Postfix should be corrected so it doesn't add mbox-style From +>> lines to mails in maildir format. +> +> I disagree with making the change (as-is, at least). +> +> In general, Notmuch does not support mboxes. We expect maildir style one +> message per file mail storage. We support single-message mboxes as a +> special case, in part because, as you note, there's plenty of other +> software that adds the mbox "From " line even though delivering to +> maildir. +> +> I think it's misleading and confusing to the users to accept and index +> the first message of mboxes, and silently ignore the rest (or worse, +> index all of the mbox and associate the text with the first message). I +> think we should reject multi-message mboxes, because we have no code to +> handle them. This patch throws away that check. +> +> Now, IIUC, the problem here is not that the files actually are +> multi-message mboxes. We could use a sample message (even a crafted one) +> that exhibits the problem, so we could add a test case, and fix Notmuch +> to deal with it gracefully (if we decide catering to potentially broken +> other software is the way to go), while retaining the code to reject +> multi-message mboxes. With the test case, we'd also avoid accidentally +> breaking this in the future. + +I agree with Jani; user may accidentally index one mbox with multiple +messages as single message if this were merged... + +We currently have very simple check; just line starting with 'From ' to +separate messages (and first line starts with 'From '). After a quick check +of these 'mbox*' "specs" this may just be within the "standard". + +In mboxviewfs I checked whether there is at least one empty line before +'^From' (might not be required by the standard, but whatever ;/) and that +there is at least 'Date:' header following (needed for file "time")... but +even this "heuristics" may not be enough if we wanted to go deep into +this (i.e. there are emails which quote beginning of an mbox file (ok, no +heuristics can match this unless there is human-level AI working on it ;) + +OTOH, presumably + +https://github.com/GNOME/gmime/blob/master/tests/data/mbox/input/substring.mbox + +contains 3 messages (or what??!!11) + +... + +Perhaps the simplest is to give users possibility to use 'footgun' option +in notmuch new (notmuch insert probably doesn't need it ???) which can be +used to skip the 'mbox' check (I was going to suggest configuration option, +but as we don't support that in bindings, ...). But of course some of the +simplicity is gone when one forgets to give the --footgun option -- next +notmuch new with the footgun probably will not pick the mail file again +(or we have to hold on updating the directory mtime indefinitely -- or +do other changes (i.e. more complicated which no-one reviews(*) anyway >;/)) + + +> BR, +> Jani. + +Tomi + +(*) Although when someone sends less than usual trivial patches which +provides significant progression to the functionality those are reviewed +promptly with a relatively good number of reviewers... + +One 'other change' could be e.g. keep a list of files that has been failing +due to this and retry those if this footgun option is given.