Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 12369431FBF for ; Tue, 10 Jul 2012 12:13:37 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ho7jLnyld7BR for ; Tue, 10 Jul 2012 12:13:35 -0700 (PDT) Received: from dmz-mailsec-scanner-2.mit.edu (DMZ-MAILSEC-SCANNER-2.MIT.EDU [18.9.25.13]) by olra.theworths.org (Postfix) with ESMTP id AA571431FAE for ; Tue, 10 Jul 2012 12:13:35 -0700 (PDT) X-AuditID: 1209190d-b7fd56d000000933-0d-4ffc7edec63f Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) by dmz-mailsec-scanner-2.mit.edu (Symantec Messaging Gateway) with SMTP id 65.B0.02355.EDE7CFF4; Tue, 10 Jul 2012 15:13:34 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id q6AJDXmL009316; Tue, 10 Jul 2012 15:13:34 -0400 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id q6AJDWY5012664 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Tue, 10 Jul 2012 15:13:33 -0400 (EDT) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77) (envelope-from ) id 1SofsK-0005gp-2V; Tue, 10 Jul 2012 15:13:32 -0400 Date: Tue, 10 Jul 2012 15:13:31 -0400 From: Austin Clements To: craven@gmx.net Subject: Re: [PATCH] v2: Added better support for multiple structured output formats. Message-ID: <20120710191331.GE7332@mit.edu> References: <87liisue70.fsf@nexoid.at> <87fw8zvqmm.fsf@nexoid.at> <87liir944f.fsf@qmul.ac.uk> <87a9z7vj3p.fsf@nexoid.at> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87a9z7vj3p.fsf@nexoid.at> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprAKsWRmVeSWpSXmKPExsUixCmqrXuv7o+/wa79whZ7G9oZLa7fnMns wOSxeNN+No9nq24xBzBFcdmkpOZklqUW6dslcGW82/KBraA3veLx0yOMDYwtgV2MnBwSAiYS j47MZ4ewxSQu3FvP1sXIxSEksI9R4ufTNiYIZwOjxKddD9khnJNMEk+utEA5Sxglds6/zgzS zyKgKnH201ewWWwCGhLb9i9nBLFFBIQkJn15xQJiMwtIS3z73cwEYgsLhEv8bYXo5RXQljj9 6hFYXEigTmLB7iuMEHFBiZMzn0D1aknc+PcSqIYDbM7yfxwgYU4BdYmP+16AjREVUJGYcnIb 2wRGoVlIumch6Z6F0L2AkXkVo2xKbpVubmJmTnFqsm5xcmJeXmqRrpFebmaJXmpK6SZGcGBL 8u5gfHdQ6RCjAAejEg/vjLQ//kKsiWXFlbmHGCU5mJREeadUAoX4kvJTKjMSizPii0pzUosP MUpwMCuJ8D50BsrxpiRWVqUW5cOkpDlYlMR5r6Tc9BcSSE8sSc1OTS1ILYLJynBwKEnwRgAj WEiwKDU9tSItM6cEIc3EwQkynAdouB5IDW9xQWJucWY6RP4Uoy7HujdHbjAKseTl56VKifNa gRQJgBRllObBzYElpFeM4kBvCfNag1TxAJMZ3KRXQEuYgJa09/wCWVKSiJCSamBcNeu7nHTK wWknWFRDjk1LWSBitcmrhjn0VhbPghqNZ5br3LW2neKfp8W38cl9c8GPvRLZhrxLL38Qm56V amD3vZGnRfDvbhHVxjTP7xlcE573+67WTTQ6piv7/mPYhyz1/e8CN2l8Otb+MNbEIMNvTd3n vLhoywCFAI+VaROr5p5vN0mVn6fEUpyRaKjFXFScCAC+NB5/IwMAAA== Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jul 2012 19:13:37 -0000 Since it would be great to use the structure printer for show as well, it would make sense to put it in its own source file, where it can easily be shared between commands. There are a few systematic code formatting problems in your patch. To be consistent with other notmuch code, there should be a space between function names and parameter lists (in both declarations and calls) and there should be a space between a keyword and a paren (e.g., "if (" and "for ("). In a function definition, the function name should start on a new line (this makes it easy to grep for a function's definition). Also, in general, try to keep lines under 80 characters wide (we're not very consistent about this one, but it would be nice to not become even less consistent about it). More detailed comments below. Quoth craven@gmx.net on Jul 10 at 3:30 pm: > As discussed in , this patch adds > support for new structured output formats (like s-expressions) by using > stateful structure_printers. An implementation of the JSON structure > printer that passes all tests is included. The output for JSON (and > text) is identical to the current output. S-Expressions will be added in > a later patch. > > A large part of this patch just implements the differentiation between > structured and non-structured output (all the code within > "if(format == unstructured_text_printer)"). > > In a second patch, the structured output code should be isolated in a > separate file, and also used in all other parts of notmuch. > > The interface is a structure structure_printer, which contains the following methods: > > - initial_state: is called to create a state object, that is passed to all invocations. This should be used to keep track of the output file and everything else necessary to correctly format output. > - map: is called when a new map (associative array, dictionary) is started. map_key and the primitives (string, number, bool) are used alternatingly to add key/value pairs. pop is used to close the map (see there). This function must return a nesting level identifier that can be used to close all nested structures (maps and lists), backing out to the returned nesting level. > - list: is called when a new list (array, vector) is started. the primitives (string, number, bool) are used consecutively to add values to the list. pop is used to close the list. This function must return a nesting level identifier that can be used to close all nested structures (maps and lists), backing out to the returned nesting level. > - pop: is called to return to a given nesting level. All lists and maps with a deeper nesting level must be closed. > - number, string, bool: output one element of the specific type. > > All functions should use the state object to insert delimiters etc. automatically when appropriate. > > Example: > int top, one; > top = map(state); > map_key(state, "foo"); > one = list(state); > number(state, 1); > number(state, 2); > number(state, 3); > pop(state, i); > map_key(state, "bar"); > map(state); > map_key(state, "baaz"); > string(state, "hello world"); > pop(state, top); > > would output JSON as follows: > > {"foo": [1, 2, 3], "bar": { "baaz": "hello world"}} > --- > notmuch-search.c | 491 ++++++++++++++++++++++++++++++++++++++++--------------- > 1 file changed, 361 insertions(+), 130 deletions(-) > > diff --git a/notmuch-search.c b/notmuch-search.c > index 3be296d..4127777 100644 > --- a/notmuch-search.c > +++ b/notmuch-search.c > @@ -28,6 +28,210 @@ typedef enum { > OUTPUT_TAGS > } output_t; > > +/* structured formatting, useful for JSON, S-Expressions, ... > + > +- initial_state: is called to create a state object, that is passed to all invocations. This should be used to keep track of the output file and everything else necessary to correctly format output. > +- map: is called when a new map (associative array, dictionary) is started. map_key and the primitives (string, number, bool) are used alternatingly to add key/value pairs. pop is used to close the map (see there). This function must return a nesting level identifier that can be used to close all nested structures (maps and lists), backing out to the returned nesting level. > +- list: is called when a new list (array, vector) is started. the primitives (string, number, bool) are used consecutively to add values to the list. pop is used to close the list. This function must return a nesting level identifier that can be used to close all nested structures (maps and lists), backing out to the returned nesting level. > +- pop: is called to return to a given nesting level. All lists and maps with a deeper nesting level must be closed. > +- number, string, bool: output one element of the specific type. > + > +All functions should use state to insert delimiters etc. automatically when appropriate. > + > +Example: > +int top, one; > +top = map(state); > +map_key(state, "foo"); > +one = list(state); > +number(state, 1); > +number(state, 2); > +number(state, 3); > +pop(state, i); > +map_key(state, "bar"); > +map(state); > +map_key(state, "baaz"); > +string(state, "hello world"); > +pop(state, top); > + > +would output JSON as follows: > + > +{"foo": [1, 2, 3], "bar": { "baaz": "hello world"}} > + > + */ The above comment should follow the style of other notmuch comments: /* Put a star before every line, use full sentences starting with a * capital letter, and wrap the comment at 72 columns. */ For the descriptions of the function pointers, it probably makes sense to comment them inline in the struct itself so the reader doesn't have to match things up. E.g., typedef struct structure_printer { /* (Description of map.) */ int (*map) (void *state); ... } structure_printer_t; > +typedef struct structure_printer { > + int (*map)(void *state); This is a rather less obvious application of notmuch code style, but there should be a space before the parameter list here, too: int (*map) (void *state); Likewise for the other fields, of course. > + int (*list)(void *state); > + void (*pop)(void *state, int level); > + void (*map_key)(void *state, const char *key); > + void (*number)(void *state, int val); > + void (*string)(void *state, const char *val); > + void (*bool)(void *state, notmuch_bool_t val); > + void *(*initial_state)(const struct structure_printer *sp, FILE *output); > +} structure_printer_t; Is there a reason to separate the structure printer's vtable from its state? It seems like this forces the caller to keep track of two things instead of just one. For show this isn't too bad because the structure printer is the formatter, but I think it will be cumbersome for the search code. Keeping them together would also be at least nominally more typesafe. > + > +/* JSON structure printer */ > + > +/* single linked list implementation for keeping track of the array/map nesting state */ Same comment formatting comment here. > +typedef struct json_list { > + int type; > + int first_already_seen; Maybe just "first_seen"? The "already" doesn't add anything. Also, this should probably be a notmuch_bool_t. > + struct json_list *rest; > +} json_list_t; > + > +#define TYPE_JSON_MAP 1 > +#define TYPE_JSON_ARRAY 2 An enum would be preferable here. > + > +typedef struct json_state { > + FILE *output; > + json_list_t *stack; > + int level; > +} json_state_t; > + > +int json_map(void *state); > +int json_list(void *state); > +void json_pop(void *state, int level); > +void json_map_key(void *state, const char *key); > +void json_number(void *state, int val); > +void json_string(void *state, const char *val); > +void json_bool(void *state, notmuch_bool_t val); > +void *json_initial_state(const struct structure_printer *sp, FILE *output); > + > +structure_printer_t json_structure_printer = { > + &json_map, > + &json_list, > + &json_pop, > + &json_map_key, > + &json_number, > + &json_string, > + &json_bool, > + &json_initial_state > +}; > + > +int json_map(void *st) { > + json_state_t *state = (json_state_t*)st; > + FILE *output = state->output; > + if(state->stack != NULL && state->stack->type == TYPE_JSON_ARRAY && state->stack->first_already_seen) { > + fputs(",", output); > + if(state->level == 1) > + fputs("\n", output); > + else > + fputs(" ", output); > + } > + if(state->stack != NULL) { > + state->stack->first_already_seen = TRUE; > + } You repeat the above code (or something very similar) in several places. It would make sense to put it in a separate function, possibly with an additional char argument to control whether to follow the comma with a newline or a space. > + fputs("{", output); > + void *ctx_json_map = talloc_new (0); > + json_list_t *el = talloc(ctx_json_map, json_list_t); You're leaking ctx_json_map here. It's also not necessary: you should use st as the talloc context for el. It still makes sense to free these as you pop, but that way if the caller needs to abort it can free the whole state structure in one swoop. > + el->type = TYPE_JSON_MAP; > + el->first_already_seen = FALSE; > + el->rest = state->stack; > + state->stack = el; > + return state->level++; > +} > + > +int json_list(void *st) { > + json_state_t *state = (json_state_t*)st; > + FILE *output = state->output; > + if(state->stack != NULL && state->stack->type == TYPE_JSON_ARRAY && state->stack->first_already_seen) { > + fputs(",", output); > + if(state->level == 1) > + fputs("\n", output); > + else > + fputs(" ", output); > + } > + if(state->stack != NULL) { > + state->stack->first_already_seen = TRUE; > + } > + fputs("[", output); > + void *ctx_json_map = talloc_new (0); > + json_list_t *el = talloc(ctx_json_map, json_list_t); > + el->type = TYPE_JSON_ARRAY; > + el->first_already_seen = FALSE; > + el->rest = state->stack; > + state->stack = el; > + return state->level++; > +} > + > +void json_pop(void *st, int level) { > + int i; > + json_state_t *state = (json_state_t*)st; > + FILE *output = state->output; > + for(i = state->level; i > level; i--) { Maybe "while (state->level > level)"? > + json_list_t *tos = state->stack; > + if(tos->type == TYPE_JSON_MAP) { > + fputs("}", output); > + } > + if(tos->type == TYPE_JSON_ARRAY) { Maybe "else if"? > + fputs("]", output); > + } > + state->stack = tos->rest; > + state->level--; > + talloc_free(tos); > + } > + if(state->level == 0) > + fputs("\n", output); > +} > + > +void json_map_key(void *st, const char *key) { > + json_state_t *state = (json_state_t*)st; > + FILE *output = state->output; > + if(state->stack != NULL && state->stack->first_already_seen) { > + fputs(",\n", output); > + } > + fputs("\"", output); > + fputs(key, output); The key needs to be escaped like any JSON string. > + fputs("\": ", output); > +} > + > +void json_number(void *st, int val) { > + json_state_t *state = (json_state_t*)st; > + FILE *output = state->output; > + if(state->stack != NULL && state->stack->type == TYPE_JSON_ARRAY && state->stack->first_already_seen) { > + fputs(", ", output); > + } > + state->stack->first_already_seen = TRUE; > + fprintf(output, "%i", val); "%d" is much more common (I had to look up %i!) > +} > + > +void json_string(void *st, const char *val) { > + json_state_t *state = (json_state_t*)st; > + FILE *output = state->output; > + void *ctx = talloc_new(0); > + if(state->stack != NULL && state->stack->type == TYPE_JSON_ARRAY && state->stack->first_already_seen) { > + fputs(",", output); > + if(state->level == 1) > + fputs("\n", output); > + else > + fputs(" ", output); > + } > + > + state->stack->first_already_seen = TRUE; > + fprintf(output, "%s", json_quote_str(ctx, val)); Take a look at the string quoting function in id:"87d34hsdx8.fsf@awakening.csail.mit.edu". It quotes directly to a FILE * without allocating an intermediate buffer. It's also actually correct, unlike json_quote_str. > + talloc_free(ctx); > +} > + > +void json_bool(void *st, notmuch_bool_t val) { > + json_state_t *state = (json_state_t*)st; > + FILE *output = state->output; This needs to insert a comma like the other output functions. > + if(val) > + fputs("true", output); > + else > + fputs("false", output); Maybe fputs(val ? "true" : "false", output)? > +} > + > +void *json_initial_state(const struct structure_printer *sp, FILE *output) { > + (void)sp; > + json_state_t *st = talloc(0, json_state_t); > + st->level = 0; > + st->stack = NULL; > + st->output = output; > + return st; > +} This is as far as I've gotten. I'll switch to your newer version and review the rest when I get a chance.