From: Carl Worth <cworth@cworth.org>
To: Austin Clements <amdragon@MIT.EDU>
Subject: Re: Xapian locking errors with custom query parser
In-Reply-To: <20110311024730.GA31011@mit.edu>
References: <87d3nhe3g9.fsf@steelpick.2x.cz>
	<AANLkTinW_n+zMtLC-fy=naUGsAiFDwdd-mAqSWEDvF=W@mail.gmail.com>
	<AANLkTinPph9Lj8h3UztQ74qMaaBVKkXB0rbiLeTX2GmW@mail.gmail.com>
	<87lj0m8ki5.fsf@yoom.home.cworth.org>
	<20110311024730.GA31011@mit.edu>
User-Agent: Notmuch/0.5 (http://notmuchmail.org) Emacs/23.2.1
	(i486-pc-linux-gnu)
Date: Thu, 10 Mar 2011 21:26:04 -0800
Message-ID: <8762rq8byr.fsf@yoom.home.cworth.org>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-=";
	micalg=pgp-sha1; protocol="application/pgp-signature"
Cc: notmuch@notmuchmail.org
Precedence: list

--=-=-=
Content-Transfer-Encoding: quoted-printable

On Thu, 10 Mar 2011 21:47:30 -0500, Austin Clements <amdragon@MIT.EDU> wrot=
e:
> Yes, qparser-3 is ready for you, and has this fix folded in to it (see
> id:20110202050336.GB28537@mit.edu).

Thanks.

I've finally had a chance to start looking at this.

The first thing that caught my eye was this question:

> +/* XXX notmuch currently registers "tag" as an exclusive boolean
> + * prefix, which means queries like "tag:x tag:y" will return messages
> + * with tag x OR tag y.  Is this intentional? */

This isn't "intentional" in the sense that it is desired, no.

Our documentation for the search syntax says:

    In addition to individual terms, multiple terms can  be  combined  with
    Boolean  operators  ( and, or, not , etc.). Each term in the query will
    be implicitly connected by a logical AND if  no  explicit  operator  is
    provided,  (except  that  terms with a common prefix will be implicitly
    combined with OR until we get Xapian defect #402 fixed).

So, when I originally wrote this code, the add_boolean_prefix function
didn't have the "exclusive" parameter that it has now. So that's
something to fix.

The next thing I notice is quite a lot of concern in the testing for
whether things were precisely Xapian compatible or not. I have two
different opinions about this:

1. For "new" search features (ADJ,NEAR,etc.) I do not have a strong
   interest in compatibility with Xapian.

   I was very careful when I wrote the documentation for the notmuch
   search syntax to only document features that I had used and tested,
   and that I was sure I wanted. (I was already thinking forward to
   perhaps writing a custom query parser at some point.)

   So you should really use our existing documentation as the
   guide. Please implement and test what it says.

   Beyond that, if you want to add additional features not mentioned in
   our documentation, then feel free to, and there's no good reason not
   to be Xapian compatible. But I also don't think there's a strong
   reason that we have to be compatible.

   Of course, for any new features here I would also like to see the
   documentation be updated.

2. For term splitting I do have a strong interest in Xapian compatibility.

   The difference here is that we aren't doing our own indexing, but
   instead relying on Xapian to do that for us, and we have also never
   carefully documented how the term splitting happens.

   What I want to happen here is that if a user grabs a chunk of text
   from an email, (say, "x#y"), and searches for it, that notmuch will
   find emails that actually contain that text. So if the indexer and
   the query parser disagree about something like this, then notmuch can
   break badly.

   I don't know how well notmuch currently meets that requirement, but
   I've been trusting in consistent term-splitting in the indexer and
   query-parser to help with this. So the frequent comments about
   incompatibility along these lines in your patches make me nervous.

   Can you enlighten me more about the compatibility differences in this
   area, and how things might break here?

> Interesting.  I could see this being useful for decluttering
> superseded review branches, though that would require renaming
> superseded branches, which always causes a mess.

Deleting any superseded for-cworth branch would never cause me any
problem. If you had other consumers of your branches that wouldn't be as
happy with branch names disappearing, then you might want to just let
them have another name outside the "for-cworth" space.

Anyway, it's just one idea for helping me get some more information from
git.

=2DCarl

=2D-=20
carl.d.worth@intel.com

--=-=-=
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iD8DBQFNebJs6JDdNq8qSWgRAqzHAJ9b5R9tAFYaoOLg3nNUSzrzsuCfdgCgjDuz
VkPEm9Osy6+wz3mF9T7lv+A=
=2nE4
-----END PGP SIGNATURE-----
--=-=-=--