--- /dev/null
+Return-Path: <cworth@cworth.org>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by olra.theworths.org (Postfix) with ESMTP id D9F65431FBD\r
+ for <notmuch@notmuchmail.org>; Fri, 5 Feb 2010 10:59:20 -0800 (PST)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -1.963\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-1.963 tagged_above=-999 required=5\r
+ tests=[ALL_TRUSTED=-1.8, AWL=0.022, BAYES_40=-0.185] autolearn=ham\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+ by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id aZa3+UAiozaW; Fri, 5 Feb 2010 10:59:19 -0800 (PST)\r
+Received: from yoom.home.cworth.org (localhost [127.0.0.1])\r
+ by olra.theworths.org (Postfix) with ESMTP id D166B431FAE;\r
+ Fri, 5 Feb 2010 10:59:19 -0800 (PST)\r
+Received: by yoom.home.cworth.org (Postfix, from userid 1000)\r
+ id 87C20254181; Sat, 6 Feb 2010 07:59:19 +1300 (NZDT)\r
+From: Carl Worth <cworth@cworth.org>\r
+To: Dominik Epple <dominik.epple@googlemail.com>\r
+In-Reply-To: <87vdgxqepm.fsf@yoom.home.cworth.org>\r
+References: <123554aa0911200056h73def158pb0db64a2a78ed687@mail.gmail.com>\r
+ <87skc8oqyn.fsf@yoom.home.cworth.org>\r
+ <123554aa0911230826o11e54d5ckc90e5ae8dab6ffd3@mail.gmail.com>\r
+ <123554aa0911250139l907c4efs60d704dae962c473@mail.gmail.com>\r
+ <87y6ltqg2p.fsf@yoom.home.cworth.org>\r
+ <87vdgxqepm.fsf@yoom.home.cworth.org>\r
+Date: Fri, 05 Feb 2010 10:59:12 -0800\r
+Message-ID: <87d40jpkzj.fsf@yoom.home.cworth.org>\r
+MIME-Version: 1.0\r
+Content-Type: multipart/signed; boundary="=-=-=";\r
+ micalg=pgp-sha1; protocol="application/pgp-signature"\r
+Cc: notmuch@notmuchmail.org\r
+Subject: Re: [notmuch] notmuch new: Memory problem (with uuencoded content)\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Fri, 05 Feb 2010 18:59:21 -0000\r
+\r
+--=-=-=\r
+\r
+On Thu, 26 Nov 2009 11:16:21 -0800, Carl Worth <cworth@cworth.org> wrote:\r
+> Clearly, some experimenting is needed. Dominik, if you can share the\r
+> large file, (with either me alone or with the whole list), a pointer to\r
+> where we could download it would be appreciated.\r
+\r
+Dominik replied to me privately and described a way for me to create a\r
+file that replicates the bug. Here's a recipe I came up with from his\r
+description:\r
+\r
+ mkdir tmp\r
+ cd tmp/\r
+ echo [database]$'\n'path=mail > notmuch-config\r
+ mkdir mail\r
+ echo From: Me$'\n'To: You$'\n'Subject: uuencode$'\n' > mail/msg\r
+ dd if=/dev/urandom of=blob bs=1024 count=10240\r
+ uuencode blob < blob >> mail/msg\r
+ NOTMUCH_CONFIG=notmuch-config notmuch new\r
+\r
+So that's a 10MB blob of random data which uuencodes to a ~14MB mail\r
+file. And notmuch (before a patch I just pushed) chews on it for quite a\r
+while, consuming several hundred MB of memory and resulting finally in a\r
+76MB Xapian database (with chert).\r
+\r
+I'm not sure if there is a Xapian bug there or not, (or perhaps a bug in\r
+how notmuch is using Xapian to generate the terms for this large of an\r
+email message).\r
+\r
+But the thing that's obvious to me is that indexing encoded data like\r
+this doesn't make any sense at all. So I've just pushed a set of patches\r
+to notmuch to make it detect uuencoded data within a mail message and\r
+ignore it.\r
+\r
+Of course, I also pushed a set of tests to the test suite for this, (and\r
+some new "notmuch search" tests while I was at it).\r
+\r
+-Carl\r
+\r
+--=-=-=\r
+Content-Type: application/pgp-signature\r
+\r
+-----BEGIN PGP SIGNATURE-----\r
+Version: GnuPG v1.4.10 (GNU/Linux)\r
+\r
+iD8DBQFLbGqA6JDdNq8qSWgRAmZJAJ9LV4r6PzG1IhZBFIzxTYVu4KxicwCgleGH\r
+kv+WIZSatWQLrDvM2KtFxpQ=\r
+=6BSl\r
+-----END PGP SIGNATURE-----\r
+--=-=-=--\r