From d66e5d2b6a4f473ca4d1d4523a8b153501ba6718 Mon Sep 17 00:00:00 2001 From: David Bremner Date: Thu, 2 Jun 2016 20:29:59 +2100 Subject: [PATCH] Re: [RFC2 Patch 5/5] lib: iterator API for message properties --- 35/fe089ae11ed04d7a9ce37527ca25f47f1c4a61 | 131 ++++++++++++++++++++++ 1 file changed, 131 insertions(+) create mode 100644 35/fe089ae11ed04d7a9ce37527ca25f47f1c4a61 diff --git a/35/fe089ae11ed04d7a9ce37527ca25f47f1c4a61 b/35/fe089ae11ed04d7a9ce37527ca25f47f1c4a61 new file mode 100644 index 000000000..cecee6367 --- /dev/null +++ b/35/fe089ae11ed04d7a9ce37527ca25f47f1c4a61 @@ -0,0 +1,131 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by arlo.cworth.org (Postfix) with ESMTP id 696116DE0243 + for ; Wed, 1 Jun 2016 16:30:12 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at cworth.org +X-Spam-Flag: NO +X-Spam-Score: -0.012 +X-Spam-Level: +X-Spam-Status: No, score=-0.012 tagged_above=-999 required=5 + tests=[AWL=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] + autolearn=disabled +Received: from arlo.cworth.org ([127.0.0.1]) + by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id 8WuOJfqdlZiw for ; + Wed, 1 Jun 2016 16:30:04 -0700 (PDT) +Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) + by arlo.cworth.org (Postfix) with ESMTPS id 5909E6DE01D0 + for ; Wed, 1 Jun 2016 16:30:04 -0700 (PDT) +Received: from remotemail by fethera.tethera.net with local (Exim 4.84) + (envelope-from ) + id 1b8FaA-00023w-9i; Wed, 01 Jun 2016 19:29:50 -0400 +Received: (nullmailer pid 9631 invoked by uid 1000); + Wed, 01 Jun 2016 23:29:59 -0000 +From: David Bremner +To: Daniel Kahn Gillmor , notmuch@notmuchmail.org +Subject: Re: [RFC2 Patch 5/5] lib: iterator API for message properties +In-Reply-To: <87eg8ht2sb.fsf@alice.fifthhorseman.net> +References: <1463927339-5441-1-git-send-email-david@tethera.net> + <1464608999-14774-1-git-send-email-david@tethera.net> + <1464608999-14774-6-git-send-email-david@tethera.net> + <8760tthfuy.fsf@zancas.localnet> <87pos1u14p.fsf@alice.fifthhorseman.net> + <87eg8ht2sb.fsf@alice.fifthhorseman.net> +User-Agent: Notmuch/0.22+28~gb9bf3f4 (http://notmuchmail.org) Emacs/24.5.1 + (x86_64-pc-linux-gnu) +Date: Wed, 01 Jun 2016 20:29:59 -0300 +Message-ID: <87lh2ofpxk.fsf@zancas.localnet> +MIME-Version: 1.0 +Content-Type: text/plain +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.20 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Wed, 01 Jun 2016 23:30:12 -0000 + +Daniel Kahn Gillmor writes: + +> On Tue 2016-05-31 21:52:06 -0400, Daniel Kahn Gillmor wrote: +> +> do we actually need this abstraction? If we're aiming to build specific +> new features (the two i'm thinking of are cryptographic-session-keys and +> reference-adjustments), couldn't we implement those features explicitly +> in xapian with their own special prefix, rather than treating them as a +> generic "property"? + +Sure, it's certainly possible. + +I guess if you don't care about the possibility of iterating all pairs +with given key prefix (which I admit makes more sense for the config +API), then the code could be simplified to look more like the tag list +handling code. C is pretty crap at generics, but I guess looking at +tags.c, it's really about iterators for notmuch_string_list_t. So it +could probably be generalized to serve here. + +For each such prefix, one would need to roughly duplicate patches 1/5 +and 3/5. It took me a little while to figure 1/5 out, but now that I +know, it would be less trouble. I guess my thinking here was that I +would provide a low level interface that people using the C API or +bindings could use without hacking xapian. + +> We already have a bit of an uncomfortable fit with tags and special +> flags (encrypted, signed, attachment, etc), where some are expected to +> be set and cleared automagically and some are expected to be manipulated +> directly by the user. Are we setting ourselves up for more of the same, +> or is there a principled way that a user can know which properties it's +> kosher for them to set and clear, and which ones they should leave +> alone? + +XPROPERTY is an internal prefix, which means it isn't added to the query +parser. As it happens, I didn't plan on CLI access to these terms +either. Both of those choices are tradeoffs to say that these are +internal metadata, suitable for manipulation by programs. Such programs +could be scripts using python or ruby. + +> If we add new specific features, we could potentially augment the dump +> format explicitly for them, without having the property abstraction. + +We could, but I think should change the dump format quite rarely, since +we risk breaking people's scripts. So if we did it for one prefix, I'd +like to do in an extensible way so that adding new prefixes is somewhat +transparent. It also means some duplication of effort/code in notmuch +dump/restore to dump/restore each new prefix. + +It's probably true that per-prefix dump format would be more compact, +since the keys would be implicit, rather than repeated for every pair. + +> We already have some explicit features for each message (subject, +> from, to, attachment, mimetype, thread id, etc), and most of them are +> derived from the message itself, with the hope that it could be +> re-derived given just the message body. Is there a distinction +> between properties that can be derived from the message body and +> properties that need to be additionally derived from some other data? + +As Tomi always says, naming is the hardest thing; properties is a bit +generic. I'm not sure the distinction you make between the "message" and +the "message body" here. I think most of our derived terms are from the +message header. My intent here is that "properties" are used for things +that cannot be derived from the message (header or body). + +TL;DR: + + - per prefix requires new code in the library and dump/restore + for every prefix + + the dump format might be more compact if done in a per prefix way. + + this code would be simpler than the generic properties code, + mainly because it would not need key value pairs, + - the library and dump/restore are parts of notmuch that have the + potential to "break the world". Not too many people are + comfortable hacking on them. + - changing the dump format is something like an ABI change for + people whose scripts rely on dump / restore. + -- 2.26.2