1 Return-Path: <tomi.ollila@nixu.com>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 3B715431FD0
\r
6 for <notmuch@notmuchmail.org>; Wed, 10 Aug 2011 01:43:20 -0700 (PDT)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none]
\r
13 Received: from olra.theworths.org ([127.0.0.1])
\r
14 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
15 with ESMTP id E6L1ep4Nqywb for <notmuch@notmuchmail.org>;
\r
16 Wed, 10 Aug 2011 01:43:18 -0700 (PDT)
\r
17 Received: from taco2.nixu.fi (taco2.nixu.fi [194.197.118.31])
\r
18 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
\r
19 (No client certificate requested)
\r
20 by olra.theworths.org (Postfix) with ESMTPS id 9ABDC431FB6
\r
21 for <notmuch@notmuchmail.org>; Wed, 10 Aug 2011 01:43:18 -0700 (PDT)
\r
22 Received: from taco2.nixu.fi (localhost [127.0.0.1])
\r
23 by taco2.nixu.fi (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id
\r
25 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT)
\r
26 for <notmuch@notmuchmail.org>; Wed, 10 Aug 2011 11:41:54 +0300
\r
27 Received: (from too@localhost)
\r
28 by taco2.nixu.fi (8.14.3/8.14.3/Submit) id p7A8fswU014712;
\r
29 Wed, 10 Aug 2011 11:41:54 +0300
\r
30 X-Authentication-Warning: taco2.nixu.fi: too set sender to
\r
31 tomi.ollila@nixu.com using -f
\r
32 From: Tomi Ollila <tomi.ollila@nixu.com>
\r
33 To: notmuch@notmuchmail.org
\r
34 Subject: Re: Added messages / total files count difference.
\r
35 References: <yf639han8zz.fsf@taco2.nixu.fi>
\r
36 X-Face: HhBM'cA~<r"^Xv\KRN0P{vn'Y"Kd;zg_y3S[4)KSN~s?O\"QPoL
\r
37 $[Xv_BD:i/F$WiEWax}R(MPS`^UaptOGD`*/=@\1lKoVa9tnrg0TW?"r7aRtgk[F
\r
38 !)g;OY^,BjTbr)Np:%c_o'jj,Z
\r
39 Date: Wed, 10 Aug 2011 11:41:54 +0300
\r
40 In-Reply-To: <yf639han8zz.fsf@taco2.nixu.fi> (Tomi Ollila's message of "Tue,
\r
41 09 Aug 2011 14:02:08 +0300")
\r
42 Message-ID: <yf6hb5plktp.fsf@taco2.nixu.fi>
\r
43 User-Agent: Gnus/5.110014 (No Gnus v0.14) Emacs/22.2 (gnu/linux)
\r
45 Content-Type: text/plain
\r
46 X-BeenThere: notmuch@notmuchmail.org
\r
47 X-Mailman-Version: 2.1.13
\r
49 List-Id: "Use and development of the notmuch mail system."
\r
50 <notmuch.notmuchmail.org>
\r
51 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
52 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
53 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
54 List-Post: <mailto:notmuch@notmuchmail.org>
\r
55 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
56 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
57 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
58 X-List-Received-Date: Wed, 10 Aug 2011 08:43:20 -0000
\r
60 On Tue 09 Aug 2011 14:02, Tomi Ollila <tomi.ollila@nixu.com> writes:
\r
64 > I get this output:
\r
66 > $ notmuch new --verbose
\r
67 > Found 15559 total files (that's not much mail).
\r
68 > Processed 15559 total files in 5m 53s (43 files/sec.).
\r
69 > Added 15546 new messages to the database.
\r
71 > $ find * -type f | wc
\r
72 > 15559 15559 529027
\r
74 > How can I determine which 13 files were dropped. All of those
\r
75 > 15559 files should be mails. I tried to check through mail files that
\r
76 > have no 'Subject:' header but those were (at least one) indexed. Could
\r
77 > it be about duplicate Message-ID: or something ?
\r
79 > $ notmuch --version
\r
80 > notmuch 0.7-7-g68e8560
\r
82 It is about duplicate Message-ID:s
\r
84 It would be nice that 'notmuch new' printes information about this
\r
85 if this were to happen (as I recall it does when new file found
\r
86 is not (considered as) a mail file).
\r
88 The steps I took to figure this out (not all iterations with & without
\r
89 'wc':s shown) at the end of this email.
\r
96 --8<----8<----8<----8<----8<----8<----8<----8<----8<----8<--
\r
98 $ find ~/mail/mails/* -type f | sort >! filenames-fs
\r
100 15559 15559 855766 filenames-fs
\r
102 $ cd /path/to/notmuch-git/bindings/python
\r
105 db = notmuch.Database()
\r
106 msgs = notmuch.Query(db,'').search_messages()
\r
109 print f.get_filename()
\r
111 $ PYTHONPATH=/path/to/python-json:`pwd` python foo.py | sort > filenames-db
\r
113 15546 15546 855037 filenames-db
\r
115 $ diff filenames-db filenames-fs | grep mails | wc
\r
125 foreach (<mails/*/*>) {
\r
128 open I, '<', $fn or die $!;
\r
130 $mid = $1, next if /^Message-ID:\s*(.*)/i;
\r
135 print "$fn: no Message-ID (in same line with header tag?)\n";
\r
138 my $fn0 = $msgids{$mid};
\r
139 if (defined $fn0) {
\r
140 print "Files '$fn0' and '$fn' have same msg id: $mid\n";
\r
143 $msgids{$mid} = $fn;
\r
147 $ perl midcheck.pl | wc
\r
149 $ perl midcheck.pl | grep \^Files | wc
\r