Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 38EF1431FB6 for ; Sun, 20 Mar 2011 23:56:09 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.699 X-Spam-Level: X-Spam-Status: No, score=-0.699 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=unavailable Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3o7rZmVIYAmr for ; Sun, 20 Mar 2011 23:56:09 -0700 (PDT) Received: from mail-qy0-f174.google.com (mail-qy0-f174.google.com [209.85.216.174]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id DB6BE431FB5 for ; Sun, 20 Mar 2011 23:56:08 -0700 (PDT) Received: by qyk7 with SMTP id 7so2228072qyk.5 for ; Sun, 20 Mar 2011 23:56:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=7RamGmwwsGCyK+eSUskw6piW/pYhxlgZHTXvR7c/0B0=; b=kdtB0hlyuTRyiBZRuYnN9HaJt4j/4qbYAGfuLJW6SCdGUiQQsroEU7Bc+muKLx1Rbi m4TTydHVihyGcBoqtjMKqrHWhnTRVJGjrM9NuBZil7tl5z16rZ3RTwUuY5RqzK0oARUJ JhBi4laqSVSkD6c9IEjZfgZmDgu2P5rJCX7p8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=xJCf3C6pAyEuO+3EWMAM3f+7pmvd9VpgGueGDVXc9nQ4eNmiztVJGhxMMwozfG6igJ k27gtjLJnVqxQVOqE3TtJJfQ+lnNXtiXPEBiAtoMw66jjMbJPxY5cM4vsmNmyOx/NjLu f2urq7ijqrUctbWNA7YMKoUAxOhUwWBiodUrY= MIME-Version: 1.0 Received: by 10.229.102.209 with SMTP id h17mr2945734qco.102.1300690565502; Sun, 20 Mar 2011 23:56:05 -0700 (PDT) Sender: amdragon@gmail.com Received: by 10.229.30.68 with HTTP; Sun, 20 Mar 2011 23:56:05 -0700 (PDT) In-Reply-To: <87hbba8ggp.fsf@yoom.home.cworth.org> References: <1291928396-27937-1-git-send-email-amdragon@mit.edu> <87hbba8ggp.fsf@yoom.home.cworth.org> Date: Mon, 21 Mar 2011 02:56:05 -0400 X-Google-Sender-Auth: GRcovGu3vALGYFkuVRjsY60hFyw Message-ID: Subject: Re: [PATCH 0/5] Fetch all message metadata in a single pass From: Austin Clements To: Carl Worth Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Mar 2011 06:56:09 -0000 Thanks for the thorough review. My updated patches are on the eager-metadata-v4 branch (also, for-review/eager-metadata-v4) at http://awakening.csail.mit.edu/git/notmuch.git Responses inline. On Thu, Mar 10, 2011 at 10:48 PM, Carl Worth wrote: > On Sun, 13 Feb 2011 15:25:48 -0500, Austin Clements wr= ote: >> I've rebased this against current master and fixed a few merge >> conflicts. =A0The rebased patches are on the eager-metadata-v3 branch of >> =A0 http://awakening.csail.mit.edu/git/notmuch.git > > Hi Austin, > > This looks like a great set of optimizations and cleanup. Here is some > (long-overdue) review. > > First, the patch series looks excellent and my review here is quite > nit-picky. I feel bad reviewing this so late and not just immediately > merging it, but I will commit to picking up a refreshed series very > quickly. > > One very minor thing is that the word "metadata" might be confused with > Xapian metadata which we are using already such as: > > =A0 =A0version_string =3D notmuch->xapian_db->get_metadata ("version"); > > So we might want to come up with a better name here. I don't have any > concrete suggestion yet, so if you don't think of anything obvious, then > don't worry about it. Hmm. I think this is okay because I always refer to "message metadata" (and I couldn't think of a better term). >> + =A0 =A0/* Get tags */ >> + =A0 =A0if (!message->tag_list) { >> + =A0 =A0 message->tag_list =3D >> + =A0 =A0 =A0 =A0 _notmuch_get_terms_with_prefix (message, i, end, tag_p= refix); >> + =A0 =A0 _notmuch_string_list_sort (message->tag_list); >> + =A0 =A0} > > Looks like the above case is missing the assert to ensure proper prefix > ordering. Fixed. >> + =A0 =A0if (!message->tag_list) >> + =A0 =A0 _notmuch_message_ensure_metadata (message); >> + >> + =A0 =A0/* We tell the iterator to add a talloc reference to the >> + =A0 =A0 * underlying list, rather than stealing it. =A0As a result, it= 's >> + =A0 =A0 * possible to modify the message tags while iterating because >> + =A0 =A0 * the iterator will keep the current list alive. */ >> + =A0 =A0return _notmuch_tags_create (message, message->tag_list, FALSE)= ; > > The behavior here is great, but don't like Boolean function parameters > being used to change the semantics. The problem with a Boolean argument > is that it's impossible to know the semantic by just reading the calling > code---you must also consult the implementation as well. > > For any case where it seems natural to implement something with a > Boolean argument, I sometimes use an approach something like this: > > =A0 =A0 =A0 =A0static void > =A0 =A0 =A0 =A0_foo_function_internal (foo_t *, ..., bool_t be_different) > =A0 =A0 =A0 =A0{ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0...; > =A0 =A0 =A0 =A0} > > =A0 =A0 =A0 =A0void > =A0 =A0 =A0 =A0foo_function (foo_t *, ...) > =A0 =A0 =A0 =A0{ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return _foo_function_internal (foo, ..., F= ALSE); > =A0 =A0 =A0 =A0} > > =A0 =A0 =A0 =A0void > =A0 =A0 =A0 =A0foo_function_different (foo_t *, ...) > =A0 =A0 =A0 =A0{ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return _foo_function_internal (foo, ..., T= RUE); > =A0 =A0 =A0 =A0} > > That is, I'm willing to accept the Boolean parameter in the case of a > static function defined immediately next to all callers. Here, for any > non-static callers the semantics should be quite clear. (And perhaps the > function calling with the FALSE case will need a clarifying suffix as > well---one might have some_function_foo and some_function_bar for the > two cases). > > Of course, my proscription against Boolean parameter doesn't apply to > something like a function that is setting some Boolean feature to TRUE > or FALSE. The important criterion is that the function semantics should > be evident to someone reading code calling the function. So if the > Boolean argument is obviously tied to some portion of the function name, > then that can be adequate. > > Enough with the generalities, back to _notmuch_tags_create... > > The caller above is the exceptional caller. It's the only one using > passing FALSE, and it also already has a large comment. So it seems that > the right fix here is to have the extra talloc_reference happen here > where there's a comment talking about adding a talloc reference. > > So it would be great to see something here like: > > =A0 =A0 =A0 =A0tags =3D _notmuch_tags_create (message, message->tag_list)= ; > =A0 =A0 =A0 =A0talloc_reference (message, message->tag_list); > =A0 =A0 =A0 =A0return tags; > > Of course, that would mean that _notmuch_tags_create can't have done the > talloc_steal. So perhaps all the other callers should be calling: > > =A0 =A0 =A0 =A0_notmuch_tags_create_with_steal (ctx, tag_list); > > What do you think? Your point about the boolean argument is well taken. In this case there's really no need for two difference versions, so I wound up making _notmuch_tags_create always steal the reference. I debated for a while whether it should add a reference, thus forcing most callers to talloc_unlink, or steal the reference, thus forcing one caller to talloc_reference. I ultimately decided it was much more likely that the caller would forget the talloc_unlink, resulting in a difficult-to-track memory leak (since the list would *eventually* be cleaned up), than that the steal would cause confusion. Also, there is some precedence for internal functions that steal an argument. So I made it steal the reference. This doesn't cause any problems in the one weird case that still needs the reference to the list. After the _notmuch_tags_create, the caller simply adds that reference. >> -notmuch_tags_t * >> -_notmuch_convert_tags (void *ctx, Xapian::TermIterator &i, >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Xapian::TermIterator &end); >> +notmuch_string_list_t * >> +_notmuch_get_terms_with_prefix (void *ctx, Xapian::TermIterator &i, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Xapian::TermIt= erator &end, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 const char *pr= efix); > > I don't really like the fact that _notmuch_get_terms_with_prefix doesn't > make a clear indication of what file it's defined in, (the old function > _notmuch_convert_tags had the same problem). It could be named > _notmuch_database_convert_tags since it's in database.cc, but that would > be odd in not actually accepting a notmuch_database_t * as the first > parameter. Any suggestion here? _notmuch_get_terms_with_prefix is weird because it captures a pattern that lies squarely in the Xapian world, being all about term iterators. Hence, I think the "right" solution (note, not the best solution), would be to hide the term iterators and make two copies of the function: one that takes a notmuch_database_t and always considers all database terms, and one private to message.cc that acts as a helper to what's now all bundled in _notmuch_message_ensure_metadata. But that bothers me more than a function that doesn't take a notmuch_database_t *. So I just added "database" to the name. >> index 0000000..d92a0ba >> --- /dev/null >> +++ b/lib/strings.c >> @@ -0,0 +1,94 @@ >> +/* strings.c - Iterator for a list of strings > > Similarly, this file might be better as string-list.c. Done. >> + * You should have received a copy of the GNU General Public License >> + * along with this program. =A0If not, see http://www.gnu.org/licenses/= . >> + * >> + * Author: Carl Worth > > Hey, wait a second, some of this code is mine, but I think the sort > function is yours. Please do start annotating the Author tags on all > files as appropriate. (There are probably lots of files missing your > Author attribution---I just happened to notice this one here since you > happened to have an Author line in the patch.) Heh, fixed. I suppose I hadn't been thinking about it, since none of the source in lib/ has other authors listed. But it raises an interesting question. When is it kosher to add oneself to a file's author list? In this case I wrote about half of the code in that admittedly short file, so that makes sense Changing a few lines presumably isn't enough. Adding a few functions? >> -/* XXX: Should write some talloc-friendly list to avoid the need for >> - * this. */ > > Hurrah! I love patches that get to remove XXX comments. ]:--8) >> + =A0 =A0/* Get thread */ >> + =A0 =A0if (!message->thread_id) >> + =A0 =A0 message->thread_id =3D >> + =A0 =A0 =A0 =A0 _notmuch_message_get_term (message, i, end, thread_pre= fix); >> + >> + =A0 =A0/* Get id */ >> + =A0 =A0assert (strcmp (thread_prefix, id_prefix) < 0); >> + =A0 =A0if (!message->message_id) >> + =A0 =A0 message->message_id =3D >> + =A0 =A0 =A0 =A0 _notmuch_message_get_term (message, i, end, id_prefix)= ; > > Missing asserts on the above two as well? Fixed. (Actually, I think there was only one assert missing in total, but it had propagated through the history.) > -Carl