test: add known broken test for indexing html
authorDavid Bremner <david@tethera.net>
Wed, 22 Mar 2017 11:23:00 +0000 (08:23 -0300)
committerDavid Bremner <david@tethera.net>
Thu, 20 Apr 2017 09:59:40 +0000 (06:59 -0300)
commit77c9ec1fddcbe145facfc3d65eee55b11ad61fb9
treebd8adc589322454463db36b966a84501858fa4d2
parente56511817284afc14352f47a13fcf85b2fabd628
test: add known broken test for indexing html

'quite' on IRC reported that notmuch new was grinding to a halt during
initial indexing, and we eventually narrowed the problem down to some
html parts with large embedded images. These cause the number of terms
added to the Xapian database to explode (the first 400 messages
generated 4.6M unique terms), and of course the resulting terms are
not much use for searching.

The second test is sanity check for any "improved" indexing of HTML.
test/T680-html-indexing.sh [new file with mode: 0755]
test/corpora/README
test/corpora/html/attribute-text [new file with mode: 0644]
test/corpora/html/embedded-image [new file with mode: 0644]