Re: [PATCH v3 15/16] added notmuch_message_reindex
authorDaniel Kahn Gillmor <dkg@fifthhorseman.net>
Wed, 10 Feb 2016 17:21:24 +0000 (12:21 +1900)
committerW. Trevor King <wking@tremily.us>
Sat, 20 Aug 2016 23:21:05 +0000 (16:21 -0700)
b7/0dc879377b862c138acb14c4e7cade79a7566f [new file with mode: 0644]

diff --git a/b7/0dc879377b862c138acb14c4e7cade79a7566f b/b7/0dc879377b862c138acb14c4e7cade79a7566f
new file mode 100644 (file)
index 0000000..6a453a2
--- /dev/null
@@ -0,0 +1,158 @@
+Return-Path: <dkg@fifthhorseman.net>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by arlo.cworth.org (Postfix) with ESMTP id 7DA916DE179D\r
+ for <notmuch@notmuchmail.org>; Wed, 10 Feb 2016 09:21:34 -0800 (PST)\r
+X-Virus-Scanned: Debian amavisd-new at cworth.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.017\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.017 tagged_above=-999 required=5\r
+ tests=[AWL=-0.017] autolearn=disabled\r
+Received: from arlo.cworth.org ([127.0.0.1])\r
+ by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id Bl9KNKXl6Bxs for <notmuch@notmuchmail.org>;\r
+ Wed, 10 Feb 2016 09:21:32 -0800 (PST)\r
+Received: from che.mayfirst.org (che.mayfirst.org [209.234.253.108])\r
+ by arlo.cworth.org (Postfix) with ESMTP id 4569E6DE13DA\r
+ for <notmuch@notmuchmail.org>; Wed, 10 Feb 2016 09:21:32 -0800 (PST)\r
+Received: from fifthhorseman.net (unknown [38.109.115.130])\r
+ by che.mayfirst.org (Postfix) with ESMTPSA id 769BCF997;\r
+ Wed, 10 Feb 2016 12:21:28 -0500 (EST)\r
+Received: by fifthhorseman.net (Postfix, from userid 1000)\r
+ id E61661FF75; Wed, 10 Feb 2016 12:21:27 -0500 (EST)\r
+From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>\r
+To: Jameson Graef Rollins <jrollins@finestructure.net>,\r
+ Notmuch Mail <notmuch@notmuchmail.org>\r
+Subject: Re: [PATCH v3 15/16] added notmuch_message_reindex\r
+In-Reply-To: <87oabpnzt4.fsf@alice.fifthhorseman.net>\r
+References: <1454272801-23623-1-git-send-email-dkg@fifthhorseman.net>\r
+ <1454272801-23623-16-git-send-email-dkg@fifthhorseman.net>\r
+ <87mvr9s8gy.fsf@servo.finestructure.net>\r
+ <87oabpnzt4.fsf@alice.fifthhorseman.net>\r
+User-Agent: Notmuch/0.21+72~gd8c4f1c (http://notmuchmail.org) Emacs/24.5.1\r
+ (x86_64-pc-linux-gnu)\r
+Date: Wed, 10 Feb 2016 12:21:24 -0500\r
+Message-ID: <871t8ko50r.fsf@alice.fifthhorseman.net>\r
+MIME-Version: 1.0\r
+Content-Type: multipart/signed; boundary="==-=-=";\r
+ micalg=pgp-sha512; protocol="application/pgp-signature"\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.20\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <https://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch/>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <https://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Wed, 10 Feb 2016 17:21:34 -0000\r
+\r
+--==-=-=\r
+Content-Type: multipart/mixed; boundary="=-=-="\r
+\r
+--=-=-=\r
+Content-Type: text/plain\r
+Content-Transfer-Encoding: quoted-printable\r
+\r
+On Tue 2016-02-09 20:01:43 -0500, Daniel Kahn Gillmor wrote:\r
+>> I just wanted to mention that I think there's a problem with the reindex\r
+>> functionality introduced in this patch (or in 16/16).  It looks like\r
+>> this function irrevocably busts apart threads.  dkg and I are\r
+>> investigating.\r
+>\r
+> it doesn't appear to be irrevocable to me, but it is definitely doing\r
+> something weird with threading.\r
+\r
+OK, this is definitely tickling some problems with threading, but those\r
+are problems that are present already in existing versions of notmuch,\r
+unrelated to this series.\r
+\r
+When removing a message from the database, its earlier presence doesn't\r
+become a ghost message, and as a result anything that points to it\r
+doesn't get assembled into the prior thread properly.\r
+\r
+The attached tarball has a python test showing this behavior with a\r
+simple thread of two messages:\r
+\r
+0 dkg@frigg:~/src/notmuch/threading-test$ ./run-test=20\r
+Found 2 total files (that's not much mail).\r
+Processed 2 total files in almost no time.\r
+Added 2 new messages to the database.\r
+Threads: 1\r
+removing and re-adding a@example.com\r
+Threads: 2\r
+removing and re-adding b@example.com\r
+Threads: 1\r
+0 dkg@frigg:~/src/notmuch/threading-test$=20\r
+\r
+the relevant python function is:\r
+\r
+\r
+def remove_and_readd(db, mid):\r
+    print('removing and re-adding', mid)\r
+    m =3D db.find_message(mid)\r
+    f =3D m.get_filename()\r
+    db.remove_message(f)\r
+    db.add_message(f)\r
+\r
+\r
+\r
+I think when a message is removed from the database, we need to know\r
+whether anything else (in its same thread?) refers to it.  If so, we\r
+should keep it around as a ghost message instead of fully removing it.\r
+\r
+does this sound like the right approach?\r
+\r
+     --dkg\r
+\r
+\r
+--=-=-=\r
+Content-Type: application/x-gtar-compressed\r
+Content-Disposition: attachment; filename=threading-test.tgz\r
+Content-Transfer-Encoding: base64\r
+\r
+H4sIAAAAAAACA+1YW2/TMBTus3+FGUhtEUlz71YxBNoF+jAmjSEeIyc+aYOaOHOcbX3ht3OaZhO9\r
+DARsGRd/L0l8Lo5z/H12XLDrcDITEZuFU2AcZOf+YSGCIKiviPUrGr2O7QQ++gydwMZ22/NdtzPp\r
+tICqVExil1II9T2/H9nXB/eXwHdoLLIMcrVvc8+y+S5YfGjx3Thme47LbAhc5rnRrmvbzl7sscAi\r
+HY1/BmoqkfZpPjEUlGrwIH0s+DAc+nfzH+/X+W85dsfX/G+7/uYkVekkFxLuXf+9u+vvbeh/gDOm\r
+Y+n6PzhyobIqnhqxyJN0Qp5/IRmUJZtAOTAbm9b7/0f/b4v/yPrvB66l9f8R6x9XcvCI9Q88vf4/\r
+ev1ZS+u/42yu/37g6PW/BZwsC26M+Yiy13DNsmIGJv4Skg9V9Blihc2MkUOmYEQ/AX9BbYseQ0Qd\r
+LBLej+zhyN6jhuVbFjmWIkP/WRrDSqpzMaKRiFbayDuQQNOSqinQJJWlos3k0/uNP4T/UUv8d+1g\r
+C/99zf+W+R9t538URT/F/3WmL9i/qQlnkKAA5DGUI/pyRXlekXFunEExmxuL0HXjinCUgD8uXCvH\r
+ffFfgrHaYhbzB9//DYMN/g+HruZ/G3j6ZFCVKPVpPoD8khZzNRW5S0iaFUIqenME0DyKkhAOCZWQ\r
+iUsIWc7DxWThPR69oFnK+yNCEYVMc9Xr1l44kSj6YYiBjvjUXXrWjhndpzwykxQTNRzu3RoTNGbm\r
+BFSYpDPIWQa9pQEjmhe4iUluDdjHt62ER5ilGYWJKsYiVmIfgsP+eqt5cnp4ZJ4dvTkMP52Nz4/6\r
+5GL5ejGOUUF4UYGc97q1lHUxdTPK85ov5QjHdYEKVeUqXFKo7PX7ZOuX6q5IGub6jVTRr6S6m/9V\r
+Xt+0ev7roW39/993A83/dvi/4D7O/ykhcF2z/P3p+cnHg3fhwen74/Hb/We94or3B2snhURm1JAJ\r
+3WnMG8eGO2hazbRDiBJo2WJoYugyOS7rivIbWhZMTTe62RpSlSDNhVDQxRyuH+90Q55kTM5DyFg6\r
+q/1viSQkDu8mLocrQsytC6PebmhoaGhoaGhoaPxt+AoTIjhIACgAAA==\r
+--=-=-=--\r
+\r
+--==-=-=\r
+Content-Type: application/pgp-signature; name="signature.asc"\r
+\r
+-----BEGIN PGP SIGNATURE-----\r
+Version: GnuPG v2\r
+\r
+iQJ8BAEBCgBmBQJWu3GUXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w\r
+ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRFREIyRTc0RjU2RkNGMkI2NzI5N0I3MzUy\r
+NEVDRkY1QUZGNjgzNzBBAAoJECTs/1r/aDcKJm8QAIdV/RVM0hjmNJHIGJ+zkg+t\r
+u4ZXt+OBZBkIVWDZ5Ksu9RrqxwG/jrfipVRp2U+GWGQh2wvsaxY6h1+rt942SEIj\r
+dYOxfEGsOC4Zr5YwmBZVFYzT1Ndp1gt9urKLfKzwrbKq9yW060/AOOoc02lobOIX\r
+rKazE9wl+scJfHDaSfpEzd+Ts5awWlXgkWd1hQTJ2z/8qndFoA+HfdA/DnwW1iI+\r
+advd9w8c+ZCXx/dRAGh9H3aQBD9tPShh9ceF7Szii9SzJ8SAtzO8pVF8ndMEqZ6c\r
+pxR/hilFhTB1FuLqf8feKflUBAGSBa0pI1ceBDqr+7mbaxS+88ZJcpyIJ3b2Hexc\r
+/yTPaxJPeWgQTSFyHp0WsuEU4FeZTh+tJOKL2yRLLqVKfhZ8oDcPSjzLBvbDpst9\r
+ytzHOTM/GpwxP+bEFr14zi4wAJANmEmJdfmFxUYJbjI4UEn7R7d6qSvnIjKpJbLa\r
+8tt6NobX8UWtycL6PxdYUVFwL6pAe4tmHp6b4b252Us8jR/OkMP8tYroyha0PICQ\r
+gfbfCvEQ9so1URDcTf6zzZ1Wkg0DG0sL10n7Ujwo7omTmLaMvHhCFxtagEOPmTgq\r
+mcEB/6c5ylVHDicHXTYWx/0XMvgea/NAWDud3DIXyu5dg+tUCk74vFOupusBR0ik\r
+L34eAydJ09C2ZQkele84\r
+=/P4A\r
+-----END PGP SIGNATURE-----\r
+--==-=-=--\r