Date: Fri, 8 Apr 2016 00:25:38 +0100
From: Olly Betts <olly@survex.com>
To: David Bremner <david@tethera.net>
Cc: notmuch@notmuchmail.org, xapian-discuss@lists.xapian.org
Subject: Re: slowdown in notmuch perf suite with xapian 1.3.5
Message-ID: <20160407232537.GB29434@survex.com>
Reply-To: Xapian Discussion <xapian-discuss@lists.xapian.org>
Mail-Followup-To: David Bremner <david@tethera.net>,
 notmuch@notmuchmail.org, xapian-discuss@lists.xapian.org
References: <87twjd639d.fsf@zancas.localnet>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <87twjd639d.fsf@zancas.localnet>
User-Agent: Mutt/1.5.21 (2010-09-15)
Precedence: list

On Thu, Apr 07, 2016 at 08:56:46AM -0300, David Bremner wrote:
> I hadn't noticed any interactive slowdown, but when I got around to
> running the notmuch performance suite, there seems to be some noticable
> slowdown with the glass backend (default in Xapian 1.3.5) compared to
> chert (using xapian 1.2.22)

Some of this is pretty much expected, though other parts I don't
entirely understand.

One of the big changes in glass is how the position table is structured.
In chert, it is ordered by (document,term) but in glass that has been
changed to (term,document).

This change makes a huge difference to phrase searches in cases where
a lot of phrase data is needed, but it has an indexing time cost -
adding a new document can no longer just append a load of entries to
the position table, but instead we need to buffer up the changes, and
then merge the entries within the existing table.

The trade-off isn't ideal for everyone, but the cases of slow phrase
searches were a real pain point that needed addressing.  The plan is
to optimise indexing speed in other ways to regain this loss - some
of that has been done but there's a lot more to do still.

So the T00-new.sh numbers make sense - there's more work to do, and
we need to read existing positional data more to insert the new stuff,
so the increased reads and writes make sense.

But guessing at what the other two tests do, I wouldn't expect them to
be affected by this.

I'm also a bit puzzled by how glass can manage not to read any data
for "dump *", and several tests seem to not read or write anything
for either backend.  What exactly are the "In/Out" numbers?

Cheers,
    Olly