From: Daniel Kahn Gillmor Date: Thu, 2 Jun 2016 17:33:54 +0000 (+2000) Subject: Re: [RFC2 Patch 5/5] lib: iterator API for message properties X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=4ebd2e0cf1296d2dc111108b17c115552cf161a6;p=notmuch-archives.git Re: [RFC2 Patch 5/5] lib: iterator API for message properties --- diff --git a/af/3f358eb2561d738c346dd8a0f6ca079ee07747 b/af/3f358eb2561d738c346dd8a0f6ca079ee07747 new file mode 100644 index 000000000..8f7f025cd --- /dev/null +++ b/af/3f358eb2561d738c346dd8a0f6ca079ee07747 @@ -0,0 +1,197 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by arlo.cworth.org (Postfix) with ESMTP id 817326DE01D0 + for ; Thu, 2 Jun 2016 10:34:08 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at cworth.org +X-Spam-Flag: NO +X-Spam-Score: -0.02 +X-Spam-Level: +X-Spam-Status: No, score=-0.02 tagged_above=-999 required=5 tests=[AWL=-0.020] + autolearn=disabled +Received: from arlo.cworth.org ([127.0.0.1]) + by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id 8i_EcJL7dRda for ; + Thu, 2 Jun 2016 10:34:00 -0700 (PDT) +Received: from che.mayfirst.org (che.mayfirst.org [162.247.75.118]) + by arlo.cworth.org (Postfix) with ESMTP id 6CEF96DE00DB + for ; Thu, 2 Jun 2016 10:34:00 -0700 (PDT) +Received: from fifthhorseman.net (unknown [38.109.115.130]) + by che.mayfirst.org (Postfix) with ESMTPSA id 5AEBDF98B; + Thu, 2 Jun 2016 13:33:58 -0400 (EDT) +Received: by fifthhorseman.net (Postfix, from userid 1000) + id 3AE6020245; Thu, 2 Jun 2016 13:33:58 -0400 (EDT) +From: Daniel Kahn Gillmor +To: David Bremner , notmuch@notmuchmail.org +Subject: Re: [RFC2 Patch 5/5] lib: iterator API for message properties +In-Reply-To: <87lh2ofpxk.fsf@zancas.localnet> +References: <1463927339-5441-1-git-send-email-david@tethera.net> + <1464608999-14774-1-git-send-email-david@tethera.net> + <1464608999-14774-6-git-send-email-david@tethera.net> + <8760tthfuy.fsf@zancas.localnet> <87pos1u14p.fsf@alice.fifthhorseman.net> + <87eg8ht2sb.fsf@alice.fifthhorseman.net> <87lh2ofpxk.fsf@zancas.localnet> +User-Agent: Notmuch/0.22+16~g87b7bd4 (http://notmuchmail.org) Emacs/24.5.1 + (x86_64-pc-linux-gnu) +Date: Thu, 02 Jun 2016 13:33:54 -0400 +Message-ID: <87inxrqyv1.fsf@alice.fifthhorseman.net> +MIME-Version: 1.0 +Content-Type: multipart/signed; boundary="=-=-="; + micalg=pgp-sha512; protocol="application/pgp-signature" +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.20 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Thu, 02 Jun 2016 17:34:08 -0000 + +--=-=-= +Content-Type: text/plain +Content-Transfer-Encoding: quoted-printable + +Hi Bremner-- + +thanks for the response! I didn't mean my post to be a wet-blanket, +just wanted to think through the tradeoffs... + +On Wed 2016-06-01 19:29:59 -0400, David Bremner wrote: +> I guess if you don't care about the possibility of iterating all pairs +> with given key prefix (which I admit makes more sense for the config +> API), then the code could be simplified to look more like the tag list +> handling code. C is pretty crap at generics, but I guess looking at +> tags.c, it's really about iterators for notmuch_string_list_t. So it +> could probably be generalized to serve here. +> +> For each such prefix, one would need to roughly duplicate patches 1/5 +> and 3/5. It took me a little while to figure 1/5 out, but now that I +> know, it would be less trouble. I guess my thinking here was that I +> would provide a low level interface that people using the C API or +> bindings could use without hacking xapian. + [...] +> XPROPERTY is an internal prefix, which means it isn't added to the query +> parser. As it happens, I didn't plan on CLI access to these terms +> either. Both of those choices are tradeoffs to say that these are +> internal metadata, suitable for manipulation by programs. Such programs +> could be scripts using python or ruby. + +I think this makes sense, and makes me more comfortable with the overall +idea of this patch series. maybe it'd be useful to clearly document the +intended scope? + +>> If we add new specific features, we could potentially augment the dump +>> format explicitly for them, without having the property abstraction. +> +> We could, but I think should change the dump format quite rarely, since +> we risk breaking people's scripts. So if we did it for one prefix, I'd +> like to do in an extensible way so that adding new prefixes is somewhat +> transparent. It also means some duplication of effort/code in notmuch +> dump/restore to dump/restore each new prefix. +> +> It's probably true that per-prefix dump format would be more compact, +> since the keys would be implicit, rather than repeated for every pair. + +true, though i'm not sure how much compactness is necessary. presumably +people are compressing their dumpfiles, and regularly repeated strings +are the easiest thing to compress. + +>> We already have some explicit features for each message (subject, +>> from, to, attachment, mimetype, thread id, etc), and most of them are +>> derived from the message itself, with the hope that it could be +>> re-derived given just the message body. Is there a distinction +>> between properties that can be derived from the message body and +>> properties that need to be additionally derived from some other data? +> +> As Tomi always says, naming is the hardest thing; properties is a bit +> generic. I'm not sure the distinction you make between the "message" and +> the "message body" here. I think most of our derived terms are from the +> message header. My intent here is that "properties" are used for things +> that cannot be derived from the message (header or body). + +To be clear, i didn't mean to distinguish betweeen "message" and +"message body" -- i don't think of the headers as being significantly +different from the body (and indeed, if we can get memoryhole working, +then some headers might be derived from or influenced by the body). + +maybe it's worth thinking through each of these per-message features, +and where they come from -- are they from the message itself (header, +body, etc), from the message's position(s) in the filesystem, or +somewhere else entirely? + +From=20the message: + + * message-id + * subject + * mimetype + * attachment + * references + * from + * to + * replyto + +From=20the filesystem itself: + + * filenames + * folder + +From=20elsewhere: + + * for messages which have multiple files, which file is actually indexed + * thread-id + * tag + +we're now talking about adding properties, which are in the "elsewhere" +category, right? + +It's worth noticing that the stuff in "elsewhere" is the stuff that +won't propagate across a dump/restore unless it's explicitly in the dump +somehow. We currently fail to restore thread-id and which file is +actually indexed across a dump/restore :/ + +> - per prefix requires new code in the library and dump/restore +> for every prefix +> + the dump format might be more compact if done in a per prefix way. +> + this code would be simpler than the generic properties code, +> mainly because it would not need key value pairs, +> - the library and dump/restore are parts of notmuch that have the +> potential to "break the world". Not too many people are +> comfortable hacking on them. +> - changing the dump format is something like an ABI change for +> people whose scripts rely on dump / restore. + +I think you've convinced me that it's good to go ahead with the +properties, assuming it's scoped as defined above. I still think that +we need a better story for upgrades to the dump format in general, but +maybe this isn't the place to make that particular case. + + --dkg + +--=-=-= +Content-Type: application/pgp-signature; name="signature.asc" + +-----BEGIN PGP SIGNATURE----- +Version: GnuPG v2 + +iQJ8BAEBCgBmBQJXUG4CXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w +ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRFREIyRTc0RjU2RkNGMkI2NzI5N0I3MzUy +NEVDRkY1QUZGNjgzNzBBAAoJECTs/1r/aDcKAYIP/0gctfr4FyAaVpazBvRPjyC0 +BgAhDO7Wv3V1G88m4spGUcH5yVuWFPhZ+bQedcPD0pExloo3ax21dxlaNiS1/qVE +FtTMkxQbUXHVcDqoYeu4XNBKMng1KSbNJuQ2LHq4g/88ytEKXvcCz7qTbNd6tTQ+ +LGlb101PdRJXtbU3MjLn86/Ehomt+AqqxYYDFMaRkDUEgaQPSzWe+H+V5nSWKZJG +xthvfzoElvAXM1sAKNPosBQI2s5k87vn43mXSrKfzNZ0OCTn+Wf4os17ARCEVKA5 +PuecT1eXsfwF/R0rj7LPuPLlU3HvSKkPaL32SQsDTRwUgOqF6Cu1FMnzsL7aPXwf +I3wAzfsn4/x7XEfzD3Mot0LhFiS6Iahu7djEshuoxyfUtfHrOLpZy6qDWs5bLyp2 +WFJx0zWP2hHoA0HqoabU+38riCiyii6Dq8Fo6qps1UGtX+2IsLVzFQq9599B8845 +8J5EU6Upf1zDPcsRCrLoEP4ePtb2eAZzeCmR5ENSFlme+Z9xK66dwEAzxeW3j+3L +4eES+Ft6FXlh7ZMhFv0sssYcXcCdAn9Umvtvt0Q98d7J1/KLEX1HIl4WQJTkTEy8 +Ni5x7Zf9hS/M/f4jAJH0M6i366RgI8hQUN9FL0/9Bfq4+pqV6ZjLTAwEiT1GQiUb +y3aH14NDyvKyhlF8Z4Ty +=uF1c +-----END PGP SIGNATURE----- +--=-=-=--