Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id BC5B6429E20 for ; Mon, 21 Mar 2011 19:04:27 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KXjvDmGUB36o for ; Mon, 21 Mar 2011 19:04:26 -0700 (PDT) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by olra.theworths.org (Postfix) with ESMTP id 8D41D431FD0 for ; Mon, 21 Mar 2011 19:04:26 -0700 (PDT) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1Q1qxC-0003BK-U8 for notmuch@notmuchmail.org; Tue, 22 Mar 2011 03:04:14 +0100 Received: from c-71-237-233-41.hsd1.or.comcast.net ([71.237.233.41]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 22 Mar 2011 03:04:14 +0100 Received: from mueen by c-71-237-233-41.hsd1.or.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 22 Mar 2011 03:04:14 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: notmuch@notmuchmail.org From: Mueen Nawaz Subject: Re: Questions about importing mail (mbox) Date: Mon, 21 Mar 2011 19:02:45 -0700 Lines: 56 Message-ID: <87hbavlxoa.fsf@fester.com> References: <87bp15m9oz.fsf@fester.com> <87zkooo88x.fsf@A7GMS.i-did-not-set--mail-host-address--so-tickle-me> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: c-71-237-233-41.hsd1.or.comcast.net Cancel-Lock: sha1:UCoiTIrgg/2RtGaCCdMlXQ7uo5o= X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Mar 2011 02:04:27 -0000 Pieter Praet writes: > It would've been a no-brainer if you'd been using Maildir all along > (mbox is evil incarnate), but... Sure, but mbox is too convenient. > I'd suggest keeping your original mbox file safe in git [1], and > consistently commiting every step of the way, so even if messages were > to get lost in translation, you still have a way to get them back, with > negligible storage overhead (just remember to "git gc --aggressive > --prune=now" when you're finished). I think you misunderstood me. A part of me suspects this has something to do with my not explaining myself, but who's to say? I'm experimenting with notmuch, and if I can translate everything I currently do in mutt to notmuch, then I'll just dump mutt. The set of mboxes I have will remain archived, but for all future incoming email, I'll switch to MH or MailDir. So I don't actually need to put my old mboxes under revision control - I just need to save them somewhere. > For the actual conversion to Maildir (and any type of mail fetching in > general), I'd suggest using FDM [2], you'll never look back. Thanks - will take a look. > Regarding the significant discrepancy between processed and added files > in Notmuch: Could be dupes (e.g. mail to/cc/bcc yourself or mailing > lists, ending up in both Inbox and Sent), which are automatically > suppressed by Notmuch. It definitely was dupes. I didn't realize that notmuch did not keep track of dupes. So I wrote a Python script to go through the mboxes and do a count of only unique messages. Problem? I have over 1000 emails that don't have a Message-ID header (case invariant search). I could go over why that is, but suffice it to say that I hate Microsoft. Once I remove all dupes, I get to within 300-400 of the count that notmuch provides. The remaining 1000+ emails do contain some dupes, and I can't find a convenient way to get an accurate count of unique emails from them, but at least now I'm in the ballpark, and a lot more confident. Incidentally, one reason I didn't realize dupes were the reason is that I did a search for a word in one email I had and notmuch did not find it - so I assumed it had not been indexed. Later on, I realized I had written a partial word and discovered that notmuch does find it if I type the full word. What am I doing wrong? Can't notmuch handle partial word matches? Do I need to specify an option to get that to work? Anyway, thanks for the help - I'll investigate further.