Date: Fri, 8 Apr 2016 01:57:25 +0100
From: Olly Betts <olly@survex.com>
To: David Bremner <david@tethera.net>
Cc: notmuch@notmuchmail.org, xapian-discuss@lists.xapian.org
Subject: Re: slowdown in notmuch perf suite with xapian 1.3.5
Message-ID: <20160408005725.GA3037@survex.com>
Reply-To: Xapian Discussion <xapian-discuss@lists.xapian.org>
Mail-Followup-To: David Bremner <david@tethera.net>,
 notmuch@notmuchmail.org, xapian-discuss@lists.xapian.org
References: <87twjd639d.fsf@zancas.localnet>
 <20160407232537.GB29434@survex.com>
 <87h9fd53vo.fsf@zancas.localnet>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <87h9fd53vo.fsf@zancas.localnet>
User-Agent: Mutt/1.5.21 (2010-09-15)
Precedence: list

On Thu, Apr 07, 2016 at 09:40:59PM -0300, David Bremner wrote:
> Olly Betts <olly@survex.com> writes:
> 
> >
> > So the T00-new.sh numbers make sense - there's more work to do, and
> > we need to read existing positional data more to insert the new stuff,
> > so the increased reads and writes make sense.
> >
> > But guessing at what the other two tests do, I wouldn't expect them to
> > be affected by this.
> 
> The non-optimized-away cases of T02-tag just adding and deleting terms
> to each document with term Tmail

That should short-cut to just only changing the data for Tmail.  Perhaps that's
not working correctly - I'll take a look at this, but probably after 1.4.0 is
out.

> > I'm also a bit puzzled by how glass can manage not to read any data
> > for "dump *", and several tests seem to not read or write anything
> > for either backend.  What exactly are the "In/Out" numbers?
> 
> that's just the output from /usr/bin/time -f '%e\t%U\t%S\t%M\t%I/%O'
> 
> The manual describes them as "number of file system
> inputs/outputs". From looking at the source, they correspond to
> ru_inblock and ru_oublock fields from the getrusage call. AFAIU, that
> means the number of non-cached read/writes.

Non-cached reads/writes are arguably the most useful sort to measure, but the
reads at least will be sensitive to OS caching, which means a repeat run will
generally show lower numbers of reads, e.g.:

$ /usr/bin/time -f '%I/%O' wc randomfile 
  240  2908 96780 randomfile
192/0
$ /usr/bin/time -f '%I/%O' wc randomfile 
  240  2908 96780 randomfile
0/0

So those numbers may not be entirely comparable, depending what order your
tests were done in, and whether you'd run the tests (or cloned the repo or some
other operation which read or wrote the files used) recently enough that their
data might still be cached.

Cheers,
    Olly