Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id E29A9431FB6 for ; Sun, 5 Aug 2012 14:43:24 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sLCmi06MadNQ for ; Sun, 5 Aug 2012 14:43:23 -0700 (PDT) Received: from mail-lb0-f181.google.com (mail-lb0-f181.google.com [209.85.217.181]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 77BF0431FAE for ; Sun, 5 Aug 2012 14:43:23 -0700 (PDT) Received: by lbbgk1 with SMTP id gk1so1357467lbb.26 for ; Sun, 05 Aug 2012 14:43:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:subject:in-reply-to:references:user-agent:date:message-id :mime-version:content-type:x-gm-message-state; bh=r6fHY8U3fg5bLh1fkjXmuQpkIxJT9zgLr/x9epW9zkY=; b=IvJclUNJkS4j/OUM8lcBQim4U8lHWboA/LFz3XYwuS2KvKIz5/3HlvbgYnhYg+c6aY a26JC7AY1RFWXnL9HgIpHqh99tK+yAM0UzUkEo8g79CHhWAkUqr/jp7XdLQBtwf5TpEP /4VhQ+TsnwHxbmguyo4EwhwyXfOxdUlbNm6YW/5ZMou1Emaa771yxvu++7x2AKzt8tgy zY2b/Ut0k5grBXPUx6fXPiLZtWScr32+Ylsvr9q5D8QaVRM02TDmh29PRRYXEtXEY1gK OHtdS466mYaxygDJZHtrBpjRev2elzU+VBZNy+/4SA6l1ghl8KzWv5KU2TqQOeVORJMr C05A== Received: by 10.112.44.163 with SMTP id f3mr3666046lbm.59.1344203001916; Sun, 05 Aug 2012 14:43:21 -0700 (PDT) Received: from localhost (dsl-hkibrasgw4-fe51df00-27.dhcp.inet.fi. [80.223.81.27]) by mx.google.com with ESMTPS id o5sm3345879lbg.5.2012.08.05.14.43.19 (version=SSLv3 cipher=OTHER); Sun, 05 Aug 2012 14:43:20 -0700 (PDT) From: Jani Nikula To: David Bremner , notmuch@notmuchmail.org Subject: Re: [PATCH v2 2/7] lib: add a date/time parser module In-Reply-To: <877gtdmqol.fsf@zancas.localnet> References: <133d16fa9b63e4cd91aa2b8816a6ad7285b3bd4c.1344065790.git.jani@nikula.org> <877gtdmqol.fsf@zancas.localnet> User-Agent: Notmuch/0.13.2+125~ga3f01dd (http://notmuchmail.org) Emacs/23.3.1 (i686-pc-linux-gnu) Date: Mon, 06 Aug 2012 00:43:18 +0300 Message-ID: <87628xnhft.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Gm-Message-State: ALoCoQnUnXPNlAh2patp9PRqcSgkCYXWz9gG03/tZbRRxxqaaDYFO2Pvzdy6IGNh++82ObcTeW8Z X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Aug 2012 21:43:25 -0000 Hi David, thanks for the review! On Sun, 05 Aug 2012, David Bremner wrote: > Jani Nikula writes: > >> + >> +static enum field >> +abs_to_rel_field (enum field field) >> +{ >> + assert (field <= TM_ABS_YEAR); >> + >> + /* note: depends on the enum ordering */ >> + return field + (TM_REL_SEC - TM_ABS_SEC); >> +} >> + > > I wonder if this would be slightly nicer of you defined a TM_FIRST_REL > or so as a synonym like TM_NONE and TM_SIZE Good idea. >> +/* get zero value for field */ >> +static int >> +field_zero (enum field field) >> +{ >> + if (field == TM_ABS_MDAY || field == TM_ABS_MON) >> + return 1; >> + else if (field == TM_ABS_YEAR) >> + return 1970; >> + else >> + return 0; >> +} > > what do you think about using the word "epoch" instead of zero here? As a non-native speaker, I'll just take your word for it if you think it would be better. :) >> +static bool >> +get_postponed_number (struct state *state, int *v, int *n, char *d) >> +{ > > I found the 1 letter names not quite obvious here. True. I think v and n are used fairly consistently throughout the source file, so I'll have to consider longer names vs. documentation comment for them. > At this point reading the code, I have not trouble understanding each > line/function, but I feel like I'm missing the big picture a bit. Yeah, perhaps the code reads better if you follow from the top level parse_time_string() down. Which is at the very end. A "big picture" documentation comment would do no harm. > What is a postponed number? I see that you grasped that later, but I'll describe anyway. Parsing is done from left to right, in a greedy fashion (i.e. match the longest possible known expression). If after that we encounter a number that we don't know what to do with yet, postpone it until we move on to the next expression. The parser for that might eat the preceding "postponed" number (for example "5 August", 5 would be postponed because we don't know what to do with yet, but then "August" would be the context to make it day of month). If that is not the case, the number will be parsed by parse_postponed_number() as a lonely, single number between there. (Yes, this should be documented better with the "big picture".) > >> + /* >> + * REVISIT: There could be a "next_field" that would be set from >> + * "field" for the duration of the handle_postponed_number() call, >> + * so it has more information to work with. >> + */ > > The notmuch convention seems to be to use XXX: for this. I'm not sure > I'd bother changing, especially if we can't decide how to package this. Okay, can be changed if needed. > >> +/* Time set helper. No input checking. Use UNSET (-1) to leave unset. */ >> +static int >> +set_abs_time (struct state *state, int hour, int min, int sec) >> +{ >> + int r; >> + >> + if (hour != UNSET) { >> + if ((r = set_field (state, TM_ABS_HOUR, hour))) >> + return r; >> + } > > So for this function and the next, the first match wins? I don't really > see the motivation for this, maybe you can explain a bit. The whole parser tries to be as unambiguous as possible. If the input leads to a situation in which any absolute time field (see enum field) is attempted to set twice, it means it appears twice in the input. For example, "2012-08-06 August", where the month is set twice. By design, that is not allowed, and set_field() fails, even if it's the same value. The only semi-exception is having redundant weekday there, for example "Monday, August 6". > > >> + /* timezone codes: offset in minutes. FIXME: add more codes. */ > > Did you think about trying to delegate the list of timezones to the > system? No. :) I'll look into it. > >> + * Compare strings s and keyword. Return number of matching chars on >> + * match, 0 for no match. Match must be at least n chars (n == 0 all >> + * of keyword), otherwise it's not a match. Use match_case for case >> + * sensitive matching. >> + */ > > I guess that's fine, and it is internal, but maybe -1 for whole string > would be slightly nicer (although I can't imagine what good matching 0 > length strings is at the moment). Can be changed. > >> + /* Minimum match length. */ >> + p = strchr (keyword, '|'); >> + if (p) { >> + minlen = p - keyword; >> + memmove (p, p + 1, strlen (p + 1) + 1); >> + } > > Something about that memmove creeps me out, but I trust you that it's > correct. Alternatively I guess you could represent keywords as pairs of > strings, which is probably more of a pain. I didn't bother to double check it now, but I remember thinking it over very carefully. :) I agree it could use more clarity to be more obviously correct. (Initially the minlen was coded as an int in the table, but the above allows the localization to decide how long the match must be.) > > >> + >> +/* Parse a single number. Typically postpone parsing until later. */ > > OK, so I finally start to understand what a postponed number is :) > I understand the compiler likes bottom up declarations, but some > top down declarations/comments are needed I think. I agree more comments would be in order. > >> +static int >> +parse_date (struct state *state, char sep, >> + unsigned long v1, unsigned long v2, unsigned long v3, >> + size_t n1, size_t n2, size_t n3) >> +{ >> + int year = UNSET, mon = UNSET, mday = UNSET; >> + >> + assert (is_date_sep (sep)); >> + >> + switch (sep) { >> + case '/': /* Date: M[M]/D[D][/YY[YY]] or M[M]/YYYY */ > > If I understand correctly, this chooses between American (?) month, day, > year ordering and "sensible" day, month, year ordering by delimiter. I > never thought about this as a way to tell (I often write D/M/Y), but > that could be just me. I agree it's fine as a convention. You understand correctly. It's obviously not a reliable way to tell unless you make it the convention, which I chose to do. Some parsers, notably the one in git, also look at the values to see if it could be D/M/Y if M/D/Y is not possible, but I don't like the ambiguity that introduces. > >> +/* >> + * Parse delimiter(s). Return < 0 on error, number of parsed chars on >> + * success. >> + */ > > So 1:-2 will parse as 1-2 ?, i.e. last delimiter wins? Maybe better to > say so explicitly. Agreed. It just throws out any extra delimiters. Perhaps this should also be more strict about the allowed delimiters (now anything is allowed). > >> +/* Combine absolute and relative fields, and round. */ >> +static int >> +create_output (struct state *state, time_t *t_out, const time_t *tnow, >> + int round) >> +{ > > It seems like most of non-obvious logic like (when is "wednesday") is > encoded here. From a maintenence point of view, it would be nice to be > able to seperate out the heuristic stuff from the mechanical, to the > degree that it is possible. Agreed. Basically this is step 2, turning all information parsed in step 1 (function parse_input()) into a sensible result. Figuring out the current time is also done here. BR, Jani.