--- /dev/null
+Return-Path: <mpn@google.com>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by olra.theworths.org (Postfix) with ESMTP id 2F826431FB6\r
+ for <notmuch@notmuchmail.org>; Tue, 4 Sep 2012 13:26:27 -0700 (PDT)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.7\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5\r
+ tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7]\r
+ autolearn=disabled\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+ by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id MfAPHEtIqKwR for <notmuch@notmuchmail.org>;\r
+ Tue, 4 Sep 2012 13:26:26 -0700 (PDT)\r
+Received: from mail-ee0-f53.google.com (mail-ee0-f53.google.com\r
+ [74.125.83.53]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client\r
+ certificate requested) by olra.theworths.org (Postfix) with ESMTPS id\r
+ 10D48431FAF for <notmuch@notmuchmail.org>; Tue, 4 Sep 2012 13:26:25 -0700\r
+ (PDT)\r
+Received: by eekb47 with SMTP id b47so2986891eek.26\r
+ for <notmuch@notmuchmail.org>; Tue, 04 Sep 2012 13:26:24 -0700 (PDT)\r
+DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;\r
+ s=20120113; h=sender:from:to:subject:in-reply-to:organization:references\r
+ :user-agent:x-face:face:x-pgp:x-pgp-fp:date:message-id:mime-version\r
+ :content-type; bh=sNVDy4WyUKME6jjouDXoNP1/3NnfrybALBbSdrMLZO4=;\r
+ b=YvPAp7JMv6nOnGQUBdhbqBHE/jyyursJMQ9i7pabRiF8klWSG2zzY8fwQNhBMr9PeC\r
+ HnUorZOBkMDeMaPEqO/o1JbKLzBIuJbNOEq/1mvYf2tecXzfoutdzAxq1DEJU6gDUWVc\r
+ +0W93MfUIqqVvGJBGKKuUnUpfj0ONasYnTj/W2UMN4X+9DOjiyYOpVzTemWCIPzayEYa\r
+ 7w9zhLDCoppXoQSUElkNABg66wxFvqfvkE8DnyLhYeZsjcH9OBtfR0qrE2veA5Vj4C+S\r
+ VnpidoOM+PmwOAG+07n4oxzDKzu0uhjRm2Fwaf1tBje5fQUwu2BD9GjXX/mGor2puO+u GgjQ==\r
+X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\r
+ d=google.com; s=20120113;\r
+ h=sender:from:to:subject:in-reply-to:organization:references\r
+ :user-agent:x-face:face:x-pgp:x-pgp-fp:date:message-id:mime-version\r
+ :content-type:x-gm-message-state;\r
+ bh=sNVDy4WyUKME6jjouDXoNP1/3NnfrybALBbSdrMLZO4=;\r
+ b=hHyT9HKQ4bIjJjIaPTFNtr7YKxB1i9XJ/L5/YD2eZopmcE8OevRTmHlJ1hpM1lICZQ\r
+ 2FBAGqrHrMcRqIFsVwDshc/T7BcKQKyUugl2WKPYxEdb/24p6y1Y2A0urL2LgFnrZaDm\r
+ DsZMnc/whEyljoD+DDsXP3OD0uGl5ClzJtRgR6MRt4hXhSiFBfSusnu2Qhr3IPA6SRQU\r
+ m27lTrLMAl1OttXlx1pGKfXuJmKdzRTKS+3iA3tQObLblxIxe7kjvB6mOkOh7o0yxU4Z\r
+ daQhwTn+Ai+bbOP4mog3wjRiFRMdNBuuJLqBPiUVm9qRke4r7EYZ1EOFBqL8sxztf3G8\r
+ yfYQ==\r
+Received: by 10.14.218.134 with SMTP id k6mr27948267eep.14.1346790384901;\r
+ Tue, 04 Sep 2012 13:26:24 -0700 (PDT)\r
+Received: by 10.14.218.134 with SMTP id k6mr27948245eep.14.1346790384652;\r
+ Tue, 04 Sep 2012 13:26:24 -0700 (PDT)\r
+Received: from mpn-glaptop ([2620:0:105f:5:f2de:f1ff:fe35:1a72])\r
+ by mx.google.com with ESMTPS id 45sm48181447eeb.8.2012.09.04.13.26.22\r
+ (version=TLSv1/SSLv3 cipher=OTHER);\r
+ Tue, 04 Sep 2012 13:26:23 -0700 (PDT)\r
+Sender: Michal Nazarewicz <mpn@google.com>\r
+From: Michal Nazarewicz <mina86@mina86.com>\r
+To: Dmitry Kurochkin <dmitry.kurochkin@gmail.com>, notmuch@notmuchmail.org\r
+Subject: Re: [PATCH] Add notmuch-remove-duplicates.py script to contrib.\r
+In-Reply-To: <87d321sg20.fsf@gmail.com>\r
+Organization: http://mina86.com/\r
+References: <1346784785-19746-1-git-send-email-dmitry.kurochkin@gmail.com>\r
+ <xa1tligpk1za.fsf@mina86.com> <87d321sg20.fsf@gmail.com>\r
+User-Agent: Notmuch/0.14+2~g416b120 (http://notmuchmail.org) Emacs/24.2.50.1\r
+ (x86_64-unknown-linux-gnu)\r
+X-Face: PbkBB1w#)bOqd`iCe"Ds{e+!C7`pkC9a|f)Qo^BMQvy\q5x3?vDQJeN(DS?|-^$uMti[3D*#^_Ts"pU$jBQLq~Ud6iNwAw_r_o_4]|JO?]}P_}Nc&"p#D(ZgUb4uCNPe7~a[DbPG0T~!&c.y$Ur,=N4RT>]dNpd; KFrfMCylc}gc??'U2j,!8%xdD\r
+Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACQElEQVQ4jW3TMWvbQBQHcBk1xE6WyALX1069oZBMlq+ouUwpEQQ6uRjttkWP4CmBgGM0BQLBdPFZYPsyFUo6uEtKDQ7oy/U96XR2Ux8ehH/89Z6enqxBcS7Lg81jmSuujrfCZcLI/TYYvbGj+jbgFpHJ/bqQAUISj8iLyu4LuFHJTosxsucO4jSDNE0Hq3hwK/ceQ5sx97b8LcUDsILfk+ovHkOIsMbBfg43VuQ5Ln9YAGCkUdKJoXR9EclFBhixy3EGVz1K6eEkhxCAkeMMnqoAhAKwhoUJkDrCqvbecaYINlFKSRS1i12VKH1XpUd4qxL876EkMcDvHj3s5RBajHHMlA5iK32e0C7VgG0RlzFPvoYHZLRmAC0BmNcBruhkE0KsMsbEc62ZwUJDxWUdMsMhVqovoT96i/DnX/ASvz/6hbCabELLk/6FF/8PNpPCGqcZTGFcBhhAaZZDbQPaAB3+KrWWy2XgbYDNIinkdWAFcCpraDE/knwe5DBqGmgzESl1p2E4MWAz0VUPgYYzmfWb9yS4vCvgsxJriNTHoIBz5YteBvg+VGISQWUqhMiByPIPpygeDBE6elD973xWwKkEiHZAHKjhuPsFnBuArrzxtakRcISv+XMIPl4aGBUJm8Emk7qBYU8IlgNEIpiJhk/No24jHwkKTFHDWfPniR4iw5vJaw2nzSjfq2zffcE/GDjRC2dn0J0XwPAbDL84TvaFCJEU4Oml9pRyEUhR3Cl2t01AoEjRbs0sYugp14/4X5n4pU4EHHnMAAAAAElFTkSuQmCC\r
+X-PGP: 50751FF4\r
+X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4\r
+Date: Tue, 04 Sep 2012 22:26:16 +0200\r
+Message-ID: <xa1tipbtk00n.fsf@mina86.com>\r
+MIME-Version: 1.0\r
+Content-Type: multipart/mixed; boundary="=-=-="\r
+X-Gm-Message-State: ALoCoQk/xF9cupH17t4530QVOx1nvqEv5KEURZzPKFAr1FZehQJKvp2ihr10O2mg2NbAwjnv2j2jVTXYK7QNsO59WJVum/5rjfAvScIG+LE185k5oCmc3wu2Q4aJsKsBiicacCBawGDsYp9gj1eufS3q0tjCSvYThLzm2Bv2tFSVR05AqY6fD5NtrLOVZq/vW8cRUxIvYBO6zxN7mjXb+BaDxgBBIvrVKA==\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Tue, 04 Sep 2012 20:26:27 -0000\r
+\r
+--=-=-=\r
+Content-Type: text/plain; charset=utf-8\r
+Content-Transfer-Encoding: quoted-printable\r
+\r
+>> On Tue, Sep 04 2012, Dmitry Kurochkin wrote:\r
+>>> +class MailComparator:\r
+>>> + """Checks if mail files are duplicates."""\r
+>>> + def __init__(self, filename):\r
+>>> + self.filename =3D filename\r
+>>> + self.mail =3D self.readFile(self.filename)\r
+>>> +\r
+>>> + def isDuplicate(self, filename):\r
+>>> + return self.mail =3D=3D self.readFile(filename)\r
+>>> +\r
+>>> + @staticmethod\r
+>>> + def readFile(filename):\r
+>>> + with open(filename) as f:\r
+>>> + data =3D ""\r
+>>> + while True:\r
+>>> + line =3D f.readline()\r
+>>> + for header in IGNORED_HEADERS:\r
+>>> + if line.startswith(header):\r
+\r
+> Michal Nazarewicz <mina86@mina86.com> writes:\r
+>> Case of headers should be ignored, but this does not ignore it.\r
+\r
+On Tue, Sep 04 2012, Dmitry Kurochkin wrote:\r
+> It does.\r
+\r
+Wait, how? If line is =E2=80=9Creceived:=E2=80=9D how does it starts with =\r
+=E2=80=9CReceived:=E2=80=9D?\r
+\r
+>>> + if os.path.realpath(comparator.filename) =3D=3D os.path.re=\r
+alpath(filename):\r
+>>> + print "Message '%s' has filenames pointing to the\r
+>>> same file: '%s' '%s'" % (msg.get_message_id(), comparator.filename,\r
+>>> filename)\r
+>>\r
+>> So why aren't those removed?\r
+>>\r
+>\r
+> Because it is the same file indexed twice (probably because of\r
+> symlinks). We do not want to remove the only message file.\r
+\r
+Ah, right, with symlinks this is troublesome, but than again, we can\r
+check if there is at least one non-symlink. If there is, delete\r
+everything else, if there is not, delete all but one arbitrarily chosen\r
+symlink.\r
+\r
+>>> + elif comparator.isDuplicate(filename):\r
+>>> + os.remove(filename)\r
+>>> + duplicates_count +=3D 1\r
+>>> + else:\r
+>>> + #print "Potential duplicates: %s" % msg.get_message_id=\r
+()\r
+>>> + suspected_duplicates_count +=3D 1\r
+>>> +\r
+>>> + new_timestamp =3D time.time()\r
+>>> + if new_timestamp - timestamp > 1:\r
+>>> + timestamp =3D new_timestamp\r
+>>> + sys.stdout.write("\rProcessed %s messages, removed %s duplicat=\r
+es..." % (msg_count, duplicates_count))\r
+>>> + sys.stdout.flush()\r
+>>> +\r
+>>> +print "\rFinished. Processed %s messages, removed %s duplicates." % (m=\r
+sg_count, duplicates_count)\r
+>>> +if duplicates_count > 0:\r
+>>> + print "You might want to run 'notmuch new' now."\r
+>>> +\r
+>>> +if suspected_duplicates_count > 0:\r
+>>> + print\r
+>>> + print "Found %s messages with duplicate IDs but different content.=\r
+" % suspected_duplicates_count\r
+>>> + print "Perhaps we should ignore more headers."\r
+>>\r
+>> Please consider the following instead (not tested):\r
+\r
+> Thanks for reviewing my poor python code :) I am afraid I do not have\r
+> enough interest in improving it. I just implemented a simple solution\r
+> for my problem. Though it looks like you already took time to rewrite\r
+> the script. Would be great if you send it as a proper patch obsoleting\r
+> this one.\r
+\r
+Bah, I'll probably won't have time to properly test it.\r
+\r
+--=20\r
+Best regards, _ _\r
+.o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o\r
+..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz =\r
+ (o o)\r
+ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--\r
+--=-=-=\r
+Content-Type: multipart/signed; boundary="==-=-=";\r
+ micalg=pgp-sha1; protocol="application/pgp-signature"\r
+\r
+--==-=-=\r
+Content-Type: text/plain\r
+\r
+\r
+--==-=-=\r
+Content-Type: application/pgp-signature\r
+\r
+-----BEGIN PGP SIGNATURE-----\r
+Version: GnuPG v1.4.10 (GNU/Linux)\r
+\r
+iQIbBAEBAgAGBQJQRmPoAAoJECBgQBJQdR/0kLEP+KCPbNE7PTqoYiHjOEc8QpFD\r
+LiKIHYNFdtx41eYbBuOMovNyBE4CS7F1WyFnDXSoXY2ajRgHFUjEwQxncakCGyD+\r
+OxJGUGsVWUo8Vq0Sb+cp5+a5Giz6iDU57XvUyXrqgdRZsGPpSPJVUtGpXCXSGJkX\r
+UA9X/Q/uUiUbZGRsLgwwRLI7NBkNMbHR8WHJBBEt2cIUPnGttRUNfhO5IVAZhr7q\r
+VUK06VXW6+dMWoaH4oOkkDzGOuDH41NEKXFxjtpCsKXUU0H5FG6XT5ertqGX6msB\r
+HMZpkSE6LYcuXMNHj4gqOtAUS7K6vao2LtLRQ0J/r8tvHCOyFeTdwcccoWZl3i8V\r
+sr5ZVGBWWTB3TAuRxD/ViTxH20f5EnbyoaJs1DNBQV8Df5TlqrmWl0f6WOMCs5GO\r
+TDN/93gF+KK1aHAVAXmsTOnkKRDYdk8NvjV8o/aoGvpvbhCVliWkARiYQFRA1X/h\r
+1MoHlcGDZUbJmCbhmlTun3rB8oXHfeQmqeIdmYRp5i/LwVW15TiEyw/Joa59exCi\r
+s3raOx7HU4Tke65S0JQ4tpTuWyBFMetmHoFH+ainb6FjGop5u6Obnl47NcxgtC5j\r
+yTeHT6iIgC3Y6sDnqjs7/UVH+FtDHm8nvhlBVqTacARUEsDkrScDLKuigcwQkT4E\r
++5qIEIK1Qqjcl2zNCNg=\r
+=o9Ln\r
+-----END PGP SIGNATURE-----\r
+--==-=-=--\r
+\r
+--=-=-=--\r