Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id BF5EB431FB6 for ; Tue, 4 Dec 2012 11:14:43 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Z5S6oMRgMKYK for ; Tue, 4 Dec 2012 11:14:39 -0800 (PST) Received: from dmz-mailsec-scanner-6.mit.edu (DMZ-MAILSEC-SCANNER-6.MIT.EDU [18.7.68.35]) by olra.theworths.org (Postfix) with ESMTP id 323A6431FAE for ; Tue, 4 Dec 2012 11:14:39 -0800 (PST) X-AuditID: 12074423-b7fcb6d000000927-bb-50be4b9e06e8 Received: from mailhub-auth-2.mit.edu ( [18.7.62.36]) by dmz-mailsec-scanner-6.mit.edu (Symantec Messaging Gateway) with SMTP id E0.75.02343.E9B4EB05; Tue, 4 Dec 2012 14:14:38 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-2.mit.edu (8.13.8/8.9.2) with ESMTP id qB4JEbE6010409; Tue, 4 Dec 2012 14:14:38 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id qB4JEZiS007735 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Tue, 4 Dec 2012 14:14:36 -0500 (EST) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80) (envelope-from ) id 1Tfxwx-0001lW-2i; Tue, 04 Dec 2012 14:14:35 -0500 From: Austin Clements To: Peter Feigl , notmuch@notmuchmail.org Subject: Re: [PATCH v2 1/5] Adding an S-expression structured output printer. In-Reply-To: <1354632382-15609-2-git-send-email-craven@gmx.net> References: <1354632382-15609-1-git-send-email-craven@gmx.net> <1354632382-15609-2-git-send-email-craven@gmx.net> User-Agent: Notmuch/0.14+100~gcb9b0b0 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu) Date: Tue, 04 Dec 2012 14:14:34 -0500 Message-ID: <87ip8hy6hx.fsf@awakening.csail.mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpileLIzCtJLcpLzFFi42IRYrdT0Z3nvS/AYOEHTYu9De2MFtdvzmR2 YPJYvGk/m8ezVbeYA5iiuGxSUnMyy1KL9O0SuDL+X53NWrA+omJK51qmBsYrzl2MnBwSAiYS nb83sEHYYhIX7q0Hs4UE9jFKfLkb2MXIBWSvZ5S4cWgmI4RzgUli6r+/LBBVSxgl1sz2AbHZ BPQlVqydxApiiwhYSkz9cglskrCAr8SC77vYQWxOAXuJTz2PmCB6CyVOXrsCVMPBISoQL3F5 qQJImEVAVeLDzHdMIGFeoONmn5EECfMKCEqcnPkEbCuzgLrEn3mXmCFsbYllC18zT2AUnIWk bBaSsllIyhYwMq9ilE3JrdLNTczMKU5N1i1OTszLSy3SNdPLzSzRS00p3cQIDl4X5R2Mfw4q HWIU4GBU4uGVMN4XIMSaWFZcmXuIUZKDSUmU958nUIgvKT+lMiOxOCO+qDQntfgQowQHs5II 7z87oBxvSmJlVWpRPkxKmoNFSZz3WspNfyGB9MSS1OzU1ILUIpisDAeHkgQvixdQo2BRanpq RVpmTglCmomDE2Q4D9DwBpAa3uKCxNzizHSI/ClGXY6Fa9qfMAqx5OXnpUqJ884FKRIAKcoo zYObA0s6rxjFgd4S5l0MUsUDTFhwk14BLWECWvJCaDfIkpJEhJRUA6Og5o/zhaa71qfcfP48 dVNTS5nbH6G4E2K13Z/in/LMmDUjVcHMeGbD7+vMi1vWnX6/e3bTT816fpECjnv9JWXsFWZn 4jz3PJqyluuBkXKelEP57z0mvGf8M+UWaHqnS9+dbBzUo7jJYolD3/zpV05YnK2aY2cU9dF7 geTHeWttFy/wvr6xvESJpTgj0VCLuag4EQDa0ukhFQMAAA== X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Dec 2012 19:14:43 -0000 On Tue, 04 Dec 2012, Peter Feigl wrote: > This commit adds a structured output printer for Lisp > S-Expressions. Later commits will use this printer in notmuch search, > show and reply. > > The structure is the same as json, but: > - arrays are written as lists: ("foo" "bar" "baaz" 1 2 3) > - maps are written as a-lists: ((key "value") (other-key "other-value")) I thought the plan was to use plists. Or are we going to support both? > - true is written as t > - false is written as nil > - null is written as nil > --- > Makefile.local | 1 + > sprinter-sexp.c | 250 ++++++++++++++++++++++++++++++++++++++++++++++++++= ++++++ > sprinter.h | 4 + > 3 files changed, 255 insertions(+) > create mode 100644 sprinter-sexp.c > > diff --git a/Makefile.local b/Makefile.local > index 2b91946..0db1713 100644 > --- a/Makefile.local > +++ b/Makefile.local > @@ -270,6 +270,7 @@ notmuch_client_srcs =3D \ > notmuch-tag.c \ > notmuch-time.c \ > sprinter-json.c \ > + sprinter-sexp.c \ > sprinter-text.c \ > query-string.c \ > mime-node.c \ > diff --git a/sprinter-sexp.c b/sprinter-sexp.c > new file mode 100644 > index 0000000..6d6bbad > --- /dev/null > +++ b/sprinter-sexp.c > @@ -0,0 +1,250 @@ > +/* notmuch - Not much of an email program, (just index and search) > + * > + * Copyright =C2=A9 2012 Carl Worth This should probably be your name. > + * > + * This program is free software: you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation, either version 3 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program. If not, see http://www.gnu.org/licenses/ . > + * > + * Author: Carl Worth Same here. > + */ > + > +#include > +#include > +#include > +#include "sprinter.h" > + > +struct sprinter_sexp { > + struct sprinter vtable; > + FILE *stream; > + /* Top of the state stack, or NULL if the printer is not currently > + * inside any aggregate types. */ > + struct sexp_state *state; > + > + /* A flag to signify that a separator should be inserted in the > + * output as soon as possible. > + */ > + notmuch_bool_t insert_separator; > +}; > + > +struct sexp_state { > + struct sexp_state *parent; > + > + /* True if nothing has been printed in this aggregate yet. > + * Suppresses the space before a value. */ > + notmuch_bool_t first; > + > + /* True if the state is a map state. > + * Used to add a space between key/value pairs. */ > + notmuch_bool_t in_map; Maybe in_alist? > + > + /* The character that closes the current aggregate. */ > + char close; Given that the close character is always ')', why have this field? > +}; > + > +/* Helper function to set up the stream to print a value. If this > + * value follows another value, prints a space. */ > +static struct sprinter_sexp * > +sexp_begin_value (struct sprinter *sp) > +{ > + struct sprinter_sexp *sps =3D (struct sprinter_sexp *) sp; > + > + if (sps->state) { > + if (! sps->state->first) { > + if (sps->insert_separator) { > + fputc ('\n', sps->stream); > + sps->insert_separator =3D FALSE; > + } else { > + if (! sps->state->in_map) > + fputc (' ', sps->stream); > + } > + } else { > + sps->state->first =3D FALSE; > + } > + } > + return sps; > +} > + > +/* Helper function to begin an aggregate type. Prints the open > + * character and pushes a new state frame. */ > +static void > +sexp_begin_aggregate (struct sprinter *sp, char open, char close) The open and close arguments seem unnecessary here, since they're always '(' and ')'. Perhaps this should instead take in_map as an argument? > +{ > + struct sprinter_sexp *sps =3D sexp_begin_value (sp); > + struct sexp_state *state =3D talloc (sps, struct sexp_state); > + fputc (open, sps->stream); > + state->parent =3D sps->state; > + state->first =3D TRUE; > + state->in_map =3D FALSE; > + state->close =3D close; > + sps->state =3D state; > +} > + > +static void > +sexp_begin_map (struct sprinter *sp) > +{ > + struct sprinter_sexp *sps =3D (struct sprinter_sexp *) sp; > + sexp_begin_aggregate (sp, '(', ')'); > + sps->state->in_map =3D TRUE; > +} > + > +static void > +sexp_begin_list (struct sprinter *sp) > +{ > + sexp_begin_aggregate (sp, '(', ')'); > +} > + > +static void > +sexp_end (struct sprinter *sp) > +{ > + struct sprinter_sexp *sps =3D (struct sprinter_sexp *) sp; > + struct sexp_state *state =3D sps->state; > + > + if (sps->state->in_map) > + fputc (')', sps->stream); > + fputc (sps->state->close, sps->stream); > + sps->state =3D state->parent; > + talloc_free (state); > + if (sps->state =3D=3D NULL) > + fputc ('\n', sps->stream); > +} > + > +static void > +sexp_string_len_internal (struct sprinter *sp, const char *val, size_t l= en, notmuch_bool_t quote) > +{ > + static const char *const escapes[] =3D { > + ['\"'] =3D "\\\"", ['\\'] =3D "\\\\", ['\b'] =3D "\\b", > + ['\f'] =3D "\\f", ['\n'] =3D "\\n", ['\t'] =3D "\\t" It's unfortunate that different Lisps have different string escaping conventions. All of these will work in Elisp. R5RS only specifies \" and \\ (anything else is unspecified, though at least MIT Scheme, Racket R5RS, and Chicken support the others). R6RS specifies all of these. In Common Lisp, \" and \\ work as expected, but \ before anything else will just ignore the \ (so "\n" is the same as "n"). Conveniently, in all of these, no characters other than " and \ actually need escaping, so I'd be inclined to print any other character literally. Consumers will have to be sure to use a UTF-8 encoding when reading. > + }; > + struct sprinter_sexp *sps =3D sexp_begin_value (sp); > + > + if(quote) > + fputc ('"', sps->stream); > + for (; len; ++val, --len) { > + unsigned char ch =3D *val; > + if (ch < ARRAY_SIZE (escapes) && escapes[ch]) > + fputs (escapes[ch], sps->stream); > + else if (ch >=3D 32) > + fputc (ch, sps->stream); > + else > + fprintf (sps->stream, "\\u%04x", ch); If we do have to include numeric character escapes, "\\%03o" would be better. As mentioned above, R5RS and Common Lisp have no means to do this. Even worse, R6RS doesn't specify octal escapes but does specify "\xNN;" (note the semicolon), which isn't compatible with *anything*, including most R5RS implementations. In practice, though, most things seem to accept the octal escape I suggested (confirmed in Elisp, MIT Scheme, Racket R5RS, and Chicken). > + } > + if(quote) > + fputc ('"', sps->stream); > +} > + > +static void > +sexp_string_len (struct sprinter *sp, const char *val, size_t len) > +{ > + sexp_string_len_internal (sp, val, len, TRUE); /* print quoted */ > +} > + > +static void > +sexp_symbol_len (struct sprinter *sp, const char *val, size_t len) > +{ > + sexp_string_len_internal (sp, val, len, FALSE); /* print unquoted */ I don't understand this. The quoting rules for symbols are completely different from the rules for strings. It seems like it would be better to print the symbol literally with fputs than to apply incorrect quoting rules to it. Even better would be to INTERNAL_ERROR if the symbol contains any characters that might require escaping, though if someone does introduce such a symbol, they'll probably find out quickly enough. > +} > + > +static void > +sexp_string (struct sprinter *sp, const char *val) > +{ > + if (val =3D=3D NULL) > + val =3D ""; > + sexp_string_len (sp, val, strlen (val)); > +} > + > +static void > +sexp_symbol (struct sprinter *sp, const char *val) > +{ > + if (val =3D=3D NULL) > + val =3D ""; > + sexp_symbol_len (sp, val, strlen (val)); > +} > + > +static void > +sexp_integer (struct sprinter *sp, int val) > +{ > + struct sprinter_sexp *sps =3D sexp_begin_value (sp); > + > + fprintf (sps->stream, "%d", val); > +} > + > +static void > +sexp_boolean (struct sprinter *sp, notmuch_bool_t val) > +{ > + struct sprinter_sexp *sps =3D sexp_begin_value (sp); > + > + fputs (val ? "t" : "nil", sps->stream); > +} > + > +static void > +sexp_null (struct sprinter *sp) > +{ > + struct sprinter_sexp *sps =3D sexp_begin_value (sp); > + > + fputs ("nil", sps->stream); > +} > + > +static void > +sexp_map_key (struct sprinter *sp, const char *key) > +{ > + struct sprinter_sexp *sps =3D (struct sprinter_sexp *) sp; > + > + if (sps->state->in_map && ! sps->state->first) > + fputs (") ", sps->stream); > + fputc ('(', sps->stream); > + sexp_symbol (sp, key); Since this is the only use of sexp_symbol, perhaps the code should be folded in? At least sexp_symbol and sexp_symbol_len should be combined into sexp_symbol. > + fputc (' ', sps->stream); > +} > + > +static void > +sexp_set_prefix (unused (struct sprinter *sp), unused (const char *name)) > +{ > +} > + > +static void > +sexp_separator (struct sprinter *sp) > +{ > + struct sprinter_sexp *sps =3D (struct sprinter_sexp *) sp; > + > + sps->insert_separator =3D TRUE; > +} > + > +struct sprinter * > +sprinter_sexp_create (const void *ctx, FILE *stream) > +{ > + static const struct sprinter_sexp template =3D { > + .vtable =3D { > + .begin_map =3D sexp_begin_map, > + .begin_list =3D sexp_begin_list, > + .end =3D sexp_end, > + .string =3D sexp_string, > + .string_len =3D sexp_string_len, > + .integer =3D sexp_integer, > + .boolean =3D sexp_boolean, > + .null =3D sexp_null, > + .map_key =3D sexp_map_key, > + .separator =3D sexp_separator, > + .set_prefix =3D sexp_set_prefix, > + .is_text_printer =3D FALSE, > + } > + }; > + struct sprinter_sexp *res; > + > + res =3D talloc (ctx, struct sprinter_sexp); > + if (! res) > + return NULL; > + > + *res =3D template; > + res->stream =3D stream; > + return &res->vtable; > +} > diff --git a/sprinter.h b/sprinter.h > index 912a526..59776a9 100644 > --- a/sprinter.h > +++ b/sprinter.h > @@ -70,4 +70,8 @@ sprinter_text_create (const void *ctx, FILE *stream); > struct sprinter * > sprinter_json_create (const void *ctx, FILE *stream); >=20=20 > +/* Create a new structure printer that emits S-Expressions. */ > +struct sprinter * > +sprinter_sexp_create (const void *ctx, FILE *stream); > + > #endif // NOTMUCH_SPRINTER_H > --=20 > 1.8.0 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch