Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 48B86429E54 for ; Tue, 24 Jan 2012 03:19:36 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -1.098 X-Spam-Level: X-Spam-Status: No, score=-1.098 tagged_above=-999 required=5 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001, NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cEzCquZN7hl7 for ; Tue, 24 Jan 2012 03:19:33 -0800 (PST) Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id A8D29429E21 for ; Tue, 24 Jan 2012 03:19:33 -0800 (PST) Received: from smtp.qmul.ac.uk ([138.37.6.40]) by mail2.qmul.ac.uk with esmtp (Exim 4.71) (envelope-from ) id 1RpePR-0001mB-4L; Tue, 24 Jan 2012 11:19:29 +0000 Received: from 94-192-233-223.zone6.bethere.co.uk ([94.192.233.223] helo=localhost) by smtp.qmul.ac.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1RpePQ-0000Nv-Pe; Tue, 24 Jan 2012 11:19:29 +0000 From: Mark Walters To: Austin Clements Subject: Re: [RFC PATCH 2/4] Add NOTMUCH_MESSAGE_FLAG_EXCLUDED flag In-Reply-To: <20120124024521.GY16740@mit.edu> References: <20120124011609.GX16740@mit.edu> <1327367923-18228-2-git-send-email-markwalters1009@gmail.com> <20120124024521.GY16740@mit.edu> User-Agent: Notmuch/0.11+99~g7f60f7e (http://notmuchmail.org) Emacs/23.2.1 (i486-pc-linux-gnu) Date: Tue, 24 Jan 2012 11:20:26 +0000 Message-ID: <87obttgxdx.fsf@qmul.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Sender-Host-Address: 94.192.233.223 X-QM-SPAM-Info: Sender has good ham record. :) X-QM-Body-MD5: 341468acf6adfd48c99f77a5f10f9cd8 (of first 20000 bytes) X-SpamAssassin-Score: -1.8 X-SpamAssassin-SpamBar: - X-SpamAssassin-Report: The QM spam filters have analysed this message to determine if it is spam. We require at least 5.0 points to mark a message as spam. This message scored -1.8 points. Summary of the scoring: * -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, * medium trust * [138.37.6.40 listed in list.dnswl.org] * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider * (markwalters1009[at]gmail.com) * -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay * domain * 0.5 AWL AWL: From: address is in the auto white-list X-QM-Scan-Virus: ClamAV says the message is clean Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Jan 2012 11:19:36 -0000 On Mon, 23 Jan 2012 21:45:21 -0500, Austin Clements wrote: > The overall structure of this series looks great. There's obviously a > lot of clean up to do, but I'll reply with a few high-level comments. > > Quoth Mark Walters on Jan 24 at 1:18 am: > > Form excluded doc_ids set and use that to exclude messages. > > Should be no functional change. > > > > --- > > lib/notmuch-private.h | 1 + > > lib/query.cc | 28 ++++++++++++++++++++++++++-- > > 2 files changed, 27 insertions(+), 2 deletions(-) > > > > diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h > > index 7bf153e..e791bb0 100644 > > --- a/lib/notmuch-private.h > > +++ b/lib/notmuch-private.h > > @@ -401,6 +401,7 @@ typedef struct _notmuch_message_list { > > */ > > struct visible _notmuch_messages { > > notmuch_bool_t is_of_list_type; > > + notmuch_doc_id_set_t *excluded_doc_ids; > > notmuch_message_node_t *iterator; > > }; > > > > diff --git a/lib/query.cc b/lib/query.cc > > index c25b301..92fa834 100644 > > --- a/lib/query.cc > > +++ b/lib/query.cc > > @@ -57,6 +57,11 @@ struct visible _notmuch_threads { > > notmuch_doc_id_set_t match_set; > > }; > > > > +static notmuch_bool_t > > +_notmuch_doc_id_set_init (void *ctx, > > + notmuch_doc_id_set_t *doc_ids, > > + GArray *arr); > > + > > notmuch_query_t * > > notmuch_query_create (notmuch_database_t *notmuch, > > const char *query_string) > > @@ -173,6 +178,7 @@ notmuch_query_search_messages (notmuch_query_t *query) > > "mail")); > > Xapian::Query string_query, final_query, exclude_query; > > Xapian::MSet mset; > > + Xapian::MSetIterator iterator; > > unsigned int flags = (Xapian::QueryParser::FLAG_BOOLEAN | > > Xapian::QueryParser::FLAG_PHRASE | > > Xapian::QueryParser::FLAG_LOVEHATE | > > @@ -193,8 +199,21 @@ notmuch_query_search_messages (notmuch_query_t *query) > > > > exclude_query = _notmuch_exclude_tags (query, final_query); > > > > - final_query = Xapian::Query (Xapian::Query::OP_AND_NOT, > > - final_query, exclude_query); > > + enquire.set_weighting_scheme (Xapian::BoolWeight()); > > + enquire.set_query (exclude_query); > > + > > + mset = enquire.get_mset (0, notmuch->xapian_db->get_doccount ()); > > + > > + GArray *excluded_doc_ids = g_array_new (FALSE, FALSE, sizeof (unsigned int)); > > + > > + for (iterator = mset.begin (); iterator != mset.end (); iterator++) > > + { > > + unsigned int doc_id = *iterator; > > + g_array_append_val (excluded_doc_ids, doc_id); > > + } > > + messages->base.excluded_doc_ids = talloc (query, _notmuch_doc_id_set); > > + _notmuch_doc_id_set_init (query, messages->base.excluded_doc_ids, > > + excluded_doc_ids); > > This might be inefficient for message-only queries, since it will > fetch *all* excluded docids. This highlights a basic difference > between message and thread search: thread search can return messages > that don't match the original query and hence needs to know all > potentially excluded messages, while message search can only return > messages that match the original query. > > It's entirely possible this doesn't matter because Xapian probably > still needs to fetch the full posting lists of the excluded terms, but > it would be worth doing a quick/hacky benchmark to verify this, with > enough excluded messages to make the cost non-trivial. > > If it does matter, you could pass in a flag indicating if the exclude > query should be limited by the original query or not. Or you could do > the limited exclude query in notmuch_query_search_messages and a > separate open-ended exclude query in notmuch_query_search_threads. Yes I will benchmark that: I am just importing a large archive into notmuch for testing. > > enquire.set_weighting_scheme (Xapian::BoolWeight()); > > > > @@ -294,6 +313,11 @@ _notmuch_mset_messages_move_to_next (notmuch_messages_t *messages) > > mset_messages = (notmuch_mset_messages_t *) messages; > > > > mset_messages->iterator++; > > + > > + while ((mset_messages->iterator != mset_messages->iterator_end) && > > + (_notmuch_doc_id_set_contains (messages->excluded_doc_ids, > > + *mset_messages->iterator))) > > + mset_messages->iterator++; > > This seemed a little weird, since you remove it in the next patch. Is > this just to keep the tests happy? (If so, it would be worth > mentioning in the commit message; other reviewers will definitely have > the same question.) Essentially just to keep tests happy: or rather to try and make it easy for a reviewer to see that the individual patch does not make any functional change. Best wishes Mark