Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 3B715431FD0 for ; Wed, 10 Aug 2011 01:43:20 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id E6L1ep4Nqywb for ; Wed, 10 Aug 2011 01:43:18 -0700 (PDT) Received: from taco2.nixu.fi (taco2.nixu.fi [194.197.118.31]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 9ABDC431FB6 for ; Wed, 10 Aug 2011 01:43:18 -0700 (PDT) Received: from taco2.nixu.fi (localhost [127.0.0.1]) by taco2.nixu.fi (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id p7A8fsim014713 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 10 Aug 2011 11:41:54 +0300 Received: (from too@localhost) by taco2.nixu.fi (8.14.3/8.14.3/Submit) id p7A8fswU014712; Wed, 10 Aug 2011 11:41:54 +0300 X-Authentication-Warning: taco2.nixu.fi: too set sender to tomi.ollila@nixu.com using -f From: Tomi Ollila To: notmuch@notmuchmail.org Subject: Re: Added messages / total files count difference. References: X-Face: HhBM'cA~ (Tomi Ollila's message of "Tue, 09 Aug 2011 14:02:08 +0300") Message-ID: User-Agent: Gnus/5.110014 (No Gnus v0.14) Emacs/22.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Aug 2011 08:43:20 -0000 On Tue 09 Aug 2011 14:02, Tomi Ollila writes: > Hi > > I get this output: > > $ notmuch new --verbose > Found 15559 total files (that's not much mail). > Processed 15559 total files in 5m 53s (43 files/sec.). > Added 15546 new messages to the database. > > $ find * -type f | wc > 15559 15559 529027 > > How can I determine which 13 files were dropped. All of those > 15559 files should be mails. I tried to check through mail files that > have no 'Subject:' header but those were (at least one) indexed. Could > it be about duplicate Message-ID: or something ? > > $ notmuch --version > notmuch 0.7-7-g68e8560 It is about duplicate Message-ID:s It would be nice that 'notmuch new' printes information about this if this were to happen (as I recall it does when new file found is not (considered as) a mail file). The steps I took to figure this out (not all iterations with & without 'wc':s shown) at the end of this email. > > Tomi Tomi --8<----8<----8<----8<----8<----8<----8<----8<----8<----8<-- $ find ~/mail/mails/* -type f | sort >! filenames-fs $ wc filenames-fs 15559 15559 855766 filenames-fs $ cd /path/to/notmuch-git/bindings/python $ cat > foo.py import notmuch db = notmuch.Database() msgs = notmuch.Query(db,'').search_messages() for f in msgs: print f.get_filename() $ PYTHONPATH=/path/to/python-json:`pwd` python foo.py | sort > filenames-db $ wc filenames-db 15546 15546 855037 filenames-db $ diff filenames-db filenames-fs | grep mails | wc 13 26 755 $ cd ~/mail $ cat >midcheck.pl use strict; use warnings; my %msgids; foreach () { my $fn = $_; my $mid; open I, '<', $fn or die $!; while () { $mid = $1, next if /^Message-ID:\s*(.*)/i; last if /^$/; } close I; unless ($mid) { print "$fn: no Message-ID (in same line with header tag?)\n"; next; } my $fn0 = $msgids{$mid}; if (defined $fn0) { print "Files '$fn0' and '$fn' have same msg id: $mid\n"; } else { $msgids{$mid} = $fn; } } $ perl midcheck.pl | wc 13 117 2098 $ perl midcheck.pl | grep \^Files | wc 13 117 2098