Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 7B8FC431FB6 for ; Mon, 22 Oct 2012 01:14:55 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NRHArV8rEDeK for ; Mon, 22 Oct 2012 01:14:51 -0700 (PDT) Received: from dmz-mailsec-scanner-4.mit.edu (DMZ-MAILSEC-SCANNER-4.MIT.EDU [18.9.25.15]) by olra.theworths.org (Postfix) with ESMTP id 2860C431FAE for ; Mon, 22 Oct 2012 01:14:51 -0700 (PDT) X-AuditID: 1209190f-b7f636d00000095b-5a-508500781c92 Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) by dmz-mailsec-scanner-4.mit.edu (Symantec Messaging Gateway) with SMTP id CE.F4.02395.87005805; Mon, 22 Oct 2012 04:14:48 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id q9M8El5W017881; Mon, 22 Oct 2012 04:14:48 -0400 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id q9M8Ej6M021444 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Mon, 22 Oct 2012 04:14:47 -0400 (EDT) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77) (envelope-from ) id 1TQD9p-0003T9-D0; Mon, 22 Oct 2012 04:14:45 -0400 Date: Mon, 22 Oct 2012 04:14:44 -0400 From: Austin Clements To: Jani Nikula Subject: Re: [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch Message-ID: <20121022081444.GM14861@mit.edu> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrOKsWRmVeSWpSXmKPExsUixG6nrlvB0Bpg0HLZzKJpurPF9ZszmR2Y PG7df83u8WzVLeYApigum5TUnMyy1CJ9uwSujBVnjrMXfN/PXLHt+wSWBsYTp5m6GDk4JARM JK4usu5i5AQyxSQu3FvP1sXIxSEksI9R4uzkxawQzgZGiT+//rBAOCeZJG7ePc4E4SxhlJj/ 4ToLyCgWAVWJoz1JIKPYBDQktu1fzghiiwgoSmw+uR/MZhaQlvj2u5kJxBYWCJL49f4uC4jN K6AjcfXsPjBbSKBO4vvES6wQcUGJkzOfsED06kjs3HqHDWQVyJzl/zggwvISzVtnM4PYnAJh EpdONbOD2KICKhJTTm5jm8AoPAvJpFlIJs1CmDQLyaQFjCyrGGVTcqt0cxMzc4pTk3WLkxPz 8lKLdE30cjNL9FJTSjcxguKAU5J/B+O3g0qHGAU4GJV4eBuutwQIsSaWFVfmHmKU5GBSEuW9 9wcoxJeUn1KZkVicEV9UmpNafIhRgoNZSYT3VDxQjjclsbIqtSgfJiXNwaIkzns15aa/kEB6 YklqdmpqQWoRTFaGg0NJgrflP1CjYFFqempFWmZOCUKaiYMTZDgP0PDZIDW8xQWJucWZ6RD5 U4y6HFf+zX3IKMSSl5+XKiXOuwakSACkKKM0D24OLH29YhQHekuYdw5IFQ8w9cFNegW0hAlo iTl3I8iSkkSElFQD4/wjOzd8SrQy/5tRXZummSnUUHz7lF+QXsP+D8u37FY0bGNOmHRPPiO0 1ai/W41f7+nzNpHwmWuezfFcdrWqx3T3NtM1Ffw8Szeukwr1apgQnGrQ735v9/d97qffT37b ueUBz7+Dfc1uml63P61we8frMV9G4PrZbcoHc+7aLWLON76i89yvSImlOCPRUIu5qDgRANxm F5o6AwAA Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Oct 2012 08:14:55 -0000 Overall this looks pretty good to me, and I must say, this parser is amazingly flexible and copes well with a remarkably hostile grammar. A lot of little comments below (sorry if any of this ground has already been covered in the previous four versions). I do have one broad comment. While I'm all for ad hoc parsers for ad hoc grammars like dates, there is one piece of the literature I think this parser suffers for by ignoring: tokenizing. I think it would simplify a lot of this code if it did a tokenizing pass before the parsing pass. It doesn't have to be a serious tokenizer with streaming and keywords and token types and junk; just something that first splits the input into substrings, possibly just non-overlapping matches of [[:digit:]]+|[[:alpha:]]+|[-+:/.]. This would simplify the handling of postponed numbers because, with trivial lookahead in the token stream, you wouldn't have to postpone them. Likewise, it would eliminate last_field. It would simplify keyword matching because you wouldn't have to worry about matching substrings (I spent a long time staring at that code before I figured out what it would and wouldn't accept). Most important, I think it would make the parser more predictable for users; for example, the parser currently accepts things like "saturtoday" because it's aggressively single-pass. Quoth Jani Nikula on Oct 22 at 12:22 am: > Add a date/time parser to notmuch, to be used for adding date range > query support for notmuch lib later on. Add the parser to a directory > of its own to make it independent of the rest of the notmuch code > base. > > Signed-off-by: Jani Nikula > --- > Makefile | 2 +- > parse-time-string/Makefile | 5 + > parse-time-string/Makefile.local | 12 + > parse-time-string/README | 9 + > parse-time-string/parse-time-string.c | 1477 +++++++++++++++++++++++++++++++++ > parse-time-string/parse-time-string.h | 102 +++ > 6 files changed, 1606 insertions(+), 1 deletion(-) > create mode 100644 parse-time-string/Makefile > create mode 100644 parse-time-string/Makefile.local > create mode 100644 parse-time-string/README > create mode 100644 parse-time-string/parse-time-string.c > create mode 100644 parse-time-string/parse-time-string.h > > diff --git a/Makefile b/Makefile > index e5e2e3a..bb9c316 100644 > --- a/Makefile > +++ b/Makefile > @@ -3,7 +3,7 @@ > all: > > # List all subdirectories here. Each contains its own Makefile.local > -subdirs = compat completion emacs lib man util test > +subdirs = compat completion emacs lib man parse-time-string util test > > # We make all targets depend on the Makefiles themselves. > global_deps = Makefile Makefile.config Makefile.local \ > diff --git a/parse-time-string/Makefile b/parse-time-string/Makefile > new file mode 100644 > index 0000000..fa25832 > --- /dev/null > +++ b/parse-time-string/Makefile > @@ -0,0 +1,5 @@ > +all: > + $(MAKE) -C .. all > + > +.DEFAULT: > + $(MAKE) -C .. $@ > diff --git a/parse-time-string/Makefile.local b/parse-time-string/Makefile.local > new file mode 100644 > index 0000000..53534f3 > --- /dev/null > +++ b/parse-time-string/Makefile.local > @@ -0,0 +1,12 @@ > +dir := parse-time-string > +extra_cflags += -I$(srcdir)/$(dir) > + > +libparse-time-string_c_srcs := $(dir)/parse-time-string.c > + > +libparse-time-string_modules := $(libparse-time-string_c_srcs:.c=.o) > + > +$(dir)/libparse-time-string.a: $(libparse-time-string_modules) > + $(call quiet,AR) rcs $@ $^ > + > +SRCS := $(SRCS) $(libparse-time-string_c_srcs) > +CLEAN := $(CLEAN) $(libparse-time-string_modules) $(dir)/libparse-time-string.a > diff --git a/parse-time-string/README b/parse-time-string/README > new file mode 100644 > index 0000000..300ff1f > --- /dev/null > +++ b/parse-time-string/README > @@ -0,0 +1,9 @@ > +PARSE TIME STRING > +================= > + > +parse_time_string() is a date/time parser originally written for > +notmuch by Jani Nikula . However, there is nothing > +notmuch specific in it, and it should be kept reusable for other > +projects, and ready to be packaged on its own as needed. Please do not > +add dependencies on or references to anything notmuch specific. The > +parser should only depend on the C library. > diff --git a/parse-time-string/parse-time-string.c b/parse-time-string/parse-time-string.c > new file mode 100644 > index 0000000..942041a > --- /dev/null > +++ b/parse-time-string/parse-time-string.c > @@ -0,0 +1,1477 @@ > +/* > + * parse time string - user friendly date and time parser > + * Copyright © 2012 Jani Nikula > + * > + * This program is free software: you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation, either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program. If not, see . > + * > + * Author: Jani Nikula > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "parse-time-string.h" > + > +/* > + * IMPLEMENTATION DETAILS > + * > + * At a high level, the parsing is done in two phases: 1) actual > + * parsing of the input string and storing the parsed data into > + * 'struct state', and 2) processing of the data in 'struct state' > + * according to current time (or provided reference time) and > + * rounding. This is evident in the main entry point function > + * parse_time_string(). > + * > + * 1) The parsing phase - parse_input() > + * > + * Parsing is greedy and happens from left to right. The parsing is as > + * unambiguous as possible; only unambiguous date/time formats are > + * accepted. Redundant or contradictory absolute date/time in the > + * input (e.g. date specified multiple times/ways) is not > + * accepted. Relative date/time on the other hand just accumulates if > + * present multiple times (e.g. "5 days 5 days" just turns into 10 > + * days). > + * > + * Parsing decisions are made on the input format, not value. For > + * example, "20/5/2005" fails because the recognized format here is > + * MM/D/YYYY, even though the values would suggest DD/M/YYYY. > + * > + * Parsing is mostly stateless in the sense that parsing decisions are > + * not made based on the values of previously parsed data, or whether > + * certain data is present in the first place. (There are a few > + * exceptions to the latter part, though, such as parsing of time zone > + * that would otherwise look like plain time.) > + * > + * When the parser encounters a number that is not greedily parsed as > + * part of a format, the interpretation is postponed until the next > + * token is parsed. The parser for the next token may consume the > + * previously postponed number. For example, when parsing "20 May" the > + * meaning of "20" is not known until "May" is parsed. If the parser > + * for the next token does not consume the postponed number, the > + * number is handled as a "lone" number before parser for the next > + * token finishes. > + * > + * 2) The processing phase - create_output() > + * > + * Once the parser in phase 1 has finished, 'struct state' contains > + * all the information from the input string, and it's no longer > + * needed. Since the parser does not even handle the concept of "now", > + * the processing initializes the fields referring to the current > + * date/time. > + * > + * If requested, the result is rounded towards past or future. The > + * idea behind rounding is to support parsing date/time ranges in an > + * obvious way. For example, for a range defined as two dates (without > + * time), one would typically want to have an inclusive range from the > + * beginning of start date to the end of the end date. The caller > + * would use rounding towards past in the start date, and towards > + * future in the end date. > + * > + * The absolute date and time is shifted by the relative date and > + * time, and time zone adjustments are made. Daylight saving time > + * (DST) is specifically *not* handled at all. > + * > + * Finally, the result is stored to time_t. > + */ > + > +#define unused(x) x __attribute__ ((unused)) > + > +/* XXX: Redefine these to add i18n support. The keyword table uses > + * N_() to mark strings to be translated; they are accessed > + * dynamically using _(). */ > +#define _(s) (s) /* i18n: define as gettext (s) */ > +#define N_(s) (s) /* i18n: define as gettext_noop (s) */ > + > +#define ARRAY_SIZE(a) (sizeof (a) / sizeof (a[0])) > + > +/* > + * Field indices in the tm and set arrays of struct state. > + * > + * NOTE: There's some code that depends on the ordering of this enum. > + */ > +enum field { > + /* Keep SEC...YEAR in this order. */ > + TM_ABS_SEC, /* seconds */ > + TM_ABS_MIN, /* minutes */ > + TM_ABS_HOUR, /* hours */ > + TM_ABS_MDAY, /* day of the month */ > + TM_ABS_MON, /* month */ > + TM_ABS_YEAR, /* year */ > + > + TM_ABS_WDAY, /* day of the week. special: may be relative */ Given that this may be relative, should it really be called TM_ABS_WDAY? > + TM_ABS_ISDST, /* daylight saving time */ > + > + TM_AMPM, /* am vs. pm */ > + TM_TZ, /* timezone in minutes */ > + > + /* Keep SEC...YEAR in this order. */ > + TM_REL_SEC, /* seconds relative to absolute or reference time */ > + TM_REL_MIN, /* minutes ... */ > + TM_REL_HOUR, /* hours ... */ > + TM_REL_DAY, /* days ... */ > + TM_REL_MON, /* months ... */ > + TM_REL_YEAR, /* years ... */ > + TM_REL_WEEK, /* weeks ... */ > + > + TM_NONE, /* not a field */ > + > + TM_SIZE = TM_NONE, > + TM_FIRST_ABS = TM_ABS_SEC, > + TM_FIRST_REL = TM_REL_SEC, > +}; > + > +/* Values for the set array of struct state. */ > +enum field_set { > + FIELD_UNSET, /* The field has not been touched by parser. */ > + FIELD_SET, /* The field has been set by parser. */ > + FIELD_NOW, /* The field will be set to reference time. */ > +}; > + > +static enum field > +next_abs_field (enum field field) > +{ > + /* NOTE: Depends on the enum ordering. */ > + return field < TM_ABS_YEAR ? field + 1 : TM_NONE; > +} > + > +static enum field > +abs_to_rel_field (enum field field) > +{ > + assert (field <= TM_ABS_YEAR); > + > + /* NOTE: Depends on the enum ordering. */ > + return field + (TM_FIRST_REL - TM_FIRST_ABS); > +} > + > +/* Get epoch value for field. */ Explain what an "epoch value" for a field is. > +static int > +field_epoch (enum field field) > +{ > + if (field == TM_ABS_MDAY || field == TM_ABS_MON) > + return 1; > + else if (field == TM_ABS_YEAR) > + return 1970; > + else > + return 0; > +} > + > +/* The parsing state. */ > +struct state { > + int tm[TM_SIZE]; /* parsed date and time */ > + enum field_set set[TM_SIZE]; /* set status of tm */ > + > + enum field last_field; /* Previously set field. */ > + char delim; > + > + int postponed_length; /* Number of digits in postponed value. */ > + int postponed_value; > + char postponed_delim; /* The delimiter preceding postponed number. */ > +}; > + > +/* > + * Helpers for postponed numbers. > + * > + * postponed_length is the number of digits in postponed value. 0 > + * means there is no postponed number. -1 means there is a postponed > + * number, but it comes from a keyword, and it doesn't have digits. > + */ > +static int > +get_postponed_length (struct state *state) > +{ > + return state->postponed_length; > +} > + > +/* > + * Consume a previously postponed number. Return true if a number was > + * in fact postponed, false otherwise. Store the postponed number's > + * value in *v, length in the input string in *n (or -1 if the number > + * was written out and parsed as a keyword), and the preceding > + * delimiter to *d. Mention that v, n, and d are unchanged if no number is postponed? You exploit this for default values elsewhere in the code. > + */ > +static bool > +get_postponed_number (struct state *state, int *v, int *n, char *d) Maybe "consume_postponed_number" to emphasize that this function has side-effects (and isn't simply a "getter")? > +{ > + if (!state->postponed_length) > + return false; > + > + if (n) > + *n = state->postponed_length; > + > + if (v) > + *v = state->postponed_value; > + > + if (d) > + *d = state->postponed_delim; > + > + state->postponed_length = 0; > + state->postponed_value = 0; > + state->postponed_delim = 0; > + > + return true; > +} > + > +static int parse_postponed_number (struct state *state, enum field next_field); > + > +/* > + * Postpone a number to be handled later. If one exists already, > + * handle it first. n may be -1 to indicate a keyword that has no > + * number length. > + */ > +static int > +set_postponed_number (struct state *state, int v, int n) > +{ > + int r; > + char d = state->delim; > + > + /* Parse a previously postponed number, if any. */ > + r = parse_postponed_number (state, TM_NONE); > + if (r) > + return r; > + > + state->postponed_length = n; > + state->postponed_value = v; > + state->postponed_delim = d; > + > + return 0; > +} > + > +static void > +set_delim (struct state *state, char delim) > +{ > + state->delim = delim; > +} > + > +static void > +unset_delim (struct state *state) > +{ > + state->delim = 0; > +} > + > +/* > + * Field set/get/mod helpers. > + */ > + > +/* Return true if field has been set. */ > +static bool > +is_field_set (struct state *state, enum field field) > +{ > + assert (field < ARRAY_SIZE (state->tm)); > + > + return field < ARRAY_SIZE (state->set) && state->tm and state->set are the same size, so this will always by true given that the assert hasn't fired. Is this just defensive programming? > + state->set[field] != FIELD_UNSET; > +} > + > +static void > +unset_field (struct state *state, enum field field) > +{ > + assert (field < ARRAY_SIZE (state->tm)); > + > + state->set[field] = FIELD_UNSET; > + state->tm[field] = 0; > +} > + > +/* > + * Set field to value. A field can only be set once to ensure the > + * input does not contain redundant and potentially conflicting data. > + */ > +static int > +set_field (struct state *state, enum field field, int value) > +{ > + int r; > + > + assert (field < ARRAY_SIZE (state->tm)); > + > + /* Fields can only be set once. */ > + if (field < ARRAY_SIZE (state->set) && state->set[field] != FIELD_UNSET) Same comment about array sizes. Also, this should probably call is_field_set instead of open-coding it (which would make the array size check even more redundant!) > + return -PARSE_TIME_ERR_ALREADYSET; > + > + state->set[field] = FIELD_SET; > + > + /* Parse a previously postponed number, if any. */ > + r = parse_postponed_number (state, field); I don't understand the big picture with postponed number handling yet, but is it worth mentioning in this function's doc comment that it processes postponed numbers? > + if (r) > + return r; > + > + unset_delim (state); > + > + state->tm[field] = value; > + state->last_field = field; > + > + return 0; > +} > + > +/* > + * Mark n fields in fields to be set to the reference date/time in the > + * specified time zone, or local timezone if not specified. The fields > + * will be initialized after parsing is complete and timezone is > + * known. > + */ > +static int > +set_fields_to_now (struct state *state, enum field *fields, size_t n) > +{ > + size_t i; > + int r; > + > + for (i = 0; i < n; i++) { > + r = set_field (state, fields[i], 0); > + if (r) > + return r; > + state->set[fields[i]] = FIELD_NOW; > + } > + > + return 0; > +} > + > +/* Modify field by adding value to it. To be used on relative fields, > + * which can be modified multiple times (to accumulate). */ > +static int > +mod_field (struct state *state, enum field field, int value) add_to_field? > +{ > + int r; > + > + assert (field < ARRAY_SIZE (state->tm)); /* assert relative??? */ > + > + if (field < ARRAY_SIZE (state->set)) Another redundant check? > + state->set[field] = FIELD_SET; > + > + /* Parse a previously postponed number, if any. */ > + r = parse_postponed_number (state, field); This postponed number stuff is getting really confusing... > + if (r) > + return r; > + > + unset_delim (state); > + > + state->tm[field] += value; > + state->last_field = field; > + > + return 0; > +} > + > +/* > + * Get field value. Make sure the field is set before query. It's most > + * likely an error to call this while parsing (for example fields set > + * as FIELD_NOW will only be set to some value after parsing). > + */ > +static int > +get_field (struct state *state, enum field field) > +{ > + assert (field < ARRAY_SIZE (state->tm)); Assert that the field is set? > + > + return state->tm[field]; > +} > + > +/* > + * Validity checkers. > + */ > +static bool is_valid_12hour (int h) > +{ > + return h >= 0 && h <= 12; h >= 1? > +} > + > +static bool is_valid_time (int h, int m, int s) > +{ > + /* Allow 24:00:00 to denote end of day. */ > + if (h == 24 && m == 0 && s == 0) > + return true; > + > + return h >= 0 && h <= 23 && m >= 0 && m <= 59 && s >= 0 && s <= 59; > +} > + > +static bool is_valid_mday (int mday) > +{ > + return mday >= 1 && mday <= 31; > +} > + > +static bool is_valid_mon (int mon) > +{ > + return mon >= 1 && mon <= 12; > +} > + > +static bool is_valid_year (int year) > +{ > + return year >= 1970; > +} > + > +static bool is_valid_date (int year, int mon, int mday) > +{ > + return is_valid_year (year) && is_valid_mon (mon) && is_valid_mday (mday); > +} > + > +/* Unset indicator for time and date set helpers. */ > +#define UNSET -1 > + > +/* Time set helper. No input checking. Use UNSET (-1) to leave unset. */ > +static int > +set_abs_time (struct state *state, int hour, int min, int sec) > +{ > + int r; > + > + if (hour != UNSET) { > + if ((r = set_field (state, TM_ABS_HOUR, hour))) > + return r; > + } > + > + if (min != UNSET) { > + if ((r = set_field (state, TM_ABS_MIN, min))) > + return r; > + } > + > + if (sec != UNSET) { > + if ((r = set_field (state, TM_ABS_SEC, sec))) > + return r; > + } > + > + return 0; > +} > + > +/* Date set helper. No input checking. Use UNSET (-1) to leave unset. */ > +static int > +set_abs_date (struct state *state, int year, int mon, int mday) > +{ > + int r; > + > + if (year != UNSET) { > + if ((r = set_field (state, TM_ABS_YEAR, year))) > + return r; > + } > + > + if (mon != UNSET) { > + if ((r = set_field (state, TM_ABS_MON, mon))) > + return r; > + } > + > + if (mday != UNSET) { > + if ((r = set_field (state, TM_ABS_MDAY, mday))) > + return r; > + } > + > + return 0; > +} > + > +/* > + * Keyword parsing and handling. > + */ > +struct keyword; > +typedef int (*setter_t)(struct state *state, struct keyword *kw); > + > +struct keyword { > + const char *name; /* keyword */ > + enum field field; /* field to set, or FIELD_NONE if N/A */ > + int value; /* value to set, or 0 if N/A */ > + setter_t set; /* function to use for setting, if non-NULL */ > +}; > + > +/* > + * Setter callback functions for keywords. > + */ > +static int > +kw_set_default (struct state *state, struct keyword *kw) It took me a while to figure out what the name of this had to do with the action it performs, then I realized that it's never used in the table and only called when set is NULL. Given that, I think it would make more sense to just put the set_field call in place of the one current call to kw_set_default. Currently, this seems like one indirection too much. > +{ > + return set_field (state, kw->field, kw->value); > +} > + > +static int > +kw_set_rel (struct state *state, struct keyword *kw) > +{ > + int multiplier = 1; > + > + /* Get a previously set multiplier, if any. */ > + get_postponed_number (state, &multiplier, NULL, NULL); > + > + /* Accumulate relative field values. */ > + return mod_field (state, kw->field, multiplier * kw->value); > +} > + > +static int > +kw_set_number (struct state *state, struct keyword *kw) > +{ > + /* -1 = no length, from keyword. */ > + return set_postponed_number (state, kw->value, -1); > +} > + > +static int > +kw_set_month (struct state *state, struct keyword *kw) > +{ > + int n = get_postponed_length (state); > + > + /* Consume postponed number if it could be mday. This handles "20 > + * January". */ > + if (n == 1 || n == 2) { Should this be (n && is_valid_mday (state->postponed_value))? It seems a little odd that postponed numbers three digits or longer are treated as independent, but two digits numbers > 31 are an error. > + int r, v; > + > + get_postponed_number (state, &v, NULL, NULL); > + > + if (!is_valid_mday (v)) > + return -PARSE_TIME_ERR_INVALIDDATE; > + > + r = set_field (state, TM_ABS_MDAY, v); > + if (r) > + return r; > + } > + > + return set_field (state, kw->field, kw->value); > +} > + > +static int > +kw_set_ampm (struct state *state, struct keyword *kw) > +{ > + int n = get_postponed_length (state); > + > + /* Consume postponed number if it could be hour. This handles > + * "5pm". */ > + if (n == 1 || n == 2) { Same comment as for kw_set_month. > + int r, v; > + > + get_postponed_number (state, &v, NULL, NULL); > + > + if (!is_valid_12hour (v)) > + return -PARSE_TIME_ERR_INVALIDTIME; > + > + r = set_abs_time (state, v, 0, 0); > + if (r) > + return r; > + } > + > + return set_field (state, kw->field, kw->value); > +} > + > +static int > +kw_set_timeofday (struct state *state, struct keyword *kw) > +{ > + return set_abs_time (state, kw->value, 0, 0); > +} > + > +static int > +kw_set_today (struct state *state, unused (struct keyword *kw)) > +{ > + enum field fields[] = { TM_ABS_YEAR, TM_ABS_MON, TM_ABS_MDAY }; > + > + return set_fields_to_now (state, fields, ARRAY_SIZE (fields)); > +} > + > +static int > +kw_set_now (struct state *state, unused (struct keyword *kw)) > +{ > + enum field fields[] = { TM_ABS_HOUR, TM_ABS_MIN, TM_ABS_SEC }; > + > + return set_fields_to_now (state, fields, ARRAY_SIZE (fields)); > +} > + > +static int > +kw_set_ordinal (struct state *state, struct keyword *kw) > +{ > + int n, v; > + > + /* Require a postponed number. */ > + if (!get_postponed_number (state, &v, &n, NULL)) > + return -PARSE_TIME_ERR_DATEFORMAT; > + > + /* Ordinals are mday. */ > + if (n != 1 && n != 2) Is this redundant with your is_valid_mday test below? > + return -PARSE_TIME_ERR_DATEFORMAT; > + > + /* Be strict about st, nd, rd, and lax about th. */ > + if (strcasecmp (kw->name, "st") == 0 && v != 1 && v != 21 && v != 31) > + return -PARSE_TIME_ERR_INVALIDDATE; > + else if (strcasecmp (kw->name, "nd") == 0 && v != 2 && v != 22) > + return -PARSE_TIME_ERR_INVALIDDATE; > + else if (strcasecmp (kw->name, "rd") == 0 && v != 3 && v != 23) > + return -PARSE_TIME_ERR_INVALIDDATE; > + else if (strcasecmp (kw->name, "th") == 0 && !is_valid_mday (v)) > + return -PARSE_TIME_ERR_INVALIDDATE; > + > + return set_field (state, TM_ABS_MDAY, v); > +} > + > +/* > + * Accepted keywords. > + * > + * A keyword may optionally contain a '|' to indicate the minimum > + * match length. Without one, full match is required. It's advisable > + * to keep the minimum match parts unique across all keywords. > + * > + * If keyword begins with upper case letter, then the matching will be > + * case sensitive. Otherwise the matching is case insensitive. > + * > + * If setter is NULL, set_default will be used. > + * > + * Note: Order matters. Matching is greedy, longest match is used, but > + * of equal length matches the first one is used, unless there's an > + * equal length case sensitive match which trumps case insensitive > + * matches. If you do have a tokenizer (or disallow mashing keywords together), then all of complexity arising from longest match goes away because the keyword token either will or won't match a keyword. If you also eliminate the rule for case sensitivity and put case-sensitive things before conflicting case-insensitive things (so put "M" before "m|inutes"), then you can simply use the first match. > + */ > +static struct keyword keywords[] = { > + /* Weekdays. */ > + { N_("sun|day"), TM_ABS_WDAY, 0, NULL }, > + { N_("mon|day"), TM_ABS_WDAY, 1, NULL }, > + { N_("tue|sday"), TM_ABS_WDAY, 2, NULL }, > + { N_("wed|nesday"), TM_ABS_WDAY, 3, NULL }, > + { N_("thu|rsday"), TM_ABS_WDAY, 4, NULL }, > + { N_("fri|day"), TM_ABS_WDAY, 5, NULL }, > + { N_("sat|urday"), TM_ABS_WDAY, 6, NULL }, > + > + /* Months. */ > + { N_("jan|uary"), TM_ABS_MON, 1, kw_set_month }, > + { N_("feb|ruary"), TM_ABS_MON, 2, kw_set_month }, > + { N_("mar|ch"), TM_ABS_MON, 3, kw_set_month }, > + { N_("apr|il"), TM_ABS_MON, 4, kw_set_month }, > + { N_("may"), TM_ABS_MON, 5, kw_set_month }, > + { N_("jun|e"), TM_ABS_MON, 6, kw_set_month }, > + { N_("jul|y"), TM_ABS_MON, 7, kw_set_month }, > + { N_("aug|ust"), TM_ABS_MON, 8, kw_set_month }, > + { N_("sep|tember"), TM_ABS_MON, 9, kw_set_month }, > + { N_("oct|ober"), TM_ABS_MON, 10, kw_set_month }, > + { N_("nov|ember"), TM_ABS_MON, 11, kw_set_month }, > + { N_("dec|ember"), TM_ABS_MON, 12, kw_set_month }, > + > + /* Durations. */ > + { N_("y|ears"), TM_REL_YEAR, 1, kw_set_rel }, > + { N_("w|eeks"), TM_REL_WEEK, 1, kw_set_rel }, > + { N_("d|ays"), TM_REL_DAY, 1, kw_set_rel }, > + { N_("h|ours"), TM_REL_HOUR, 1, kw_set_rel }, > + { N_("hr|s"), TM_REL_HOUR, 1, kw_set_rel }, > + { N_("m|inutes"), TM_REL_MIN, 1, kw_set_rel }, > + /* M=months, m=minutes */ > + { N_("M"), TM_REL_MON, 1, kw_set_rel }, > + { N_("mins"), TM_REL_MIN, 1, kw_set_rel }, > + { N_("mo|nths"), TM_REL_MON, 1, kw_set_rel }, > + { N_("s|econds"), TM_REL_SEC, 1, kw_set_rel }, > + { N_("secs"), TM_REL_SEC, 1, kw_set_rel }, > + > + /* Numbers. */ > + { N_("one"), TM_NONE, 1, kw_set_number }, > + { N_("two"), TM_NONE, 2, kw_set_number }, > + { N_("three"), TM_NONE, 3, kw_set_number }, > + { N_("four"), TM_NONE, 4, kw_set_number }, > + { N_("five"), TM_NONE, 5, kw_set_number }, > + { N_("six"), TM_NONE, 6, kw_set_number }, > + { N_("seven"), TM_NONE, 7, kw_set_number }, > + { N_("eight"), TM_NONE, 8, kw_set_number }, > + { N_("nine"), TM_NONE, 9, kw_set_number }, > + { N_("ten"), TM_NONE, 10, kw_set_number }, > + { N_("dozen"), TM_NONE, 12, kw_set_number }, > + { N_("hundred"), TM_NONE, 100, kw_set_number }, > + > + /* Special number forms. */ > + { N_("this"), TM_NONE, 0, kw_set_number }, > + { N_("last"), TM_NONE, 1, kw_set_number }, > + > + /* Other special keywords. */ > + { N_("yesterday"), TM_REL_DAY, 1, kw_set_rel }, > + { N_("today"), TM_NONE, 0, kw_set_today }, > + { N_("now"), TM_NONE, 0, kw_set_now }, > + { N_("noon"), TM_NONE, 12, kw_set_timeofday }, > + { N_("midnight"), TM_NONE, 0, kw_set_timeofday }, > + { N_("am"), TM_AMPM, 0, kw_set_ampm }, > + { N_("a.m."), TM_AMPM, 0, kw_set_ampm }, > + { N_("pm"), TM_AMPM, 1, kw_set_ampm }, > + { N_("p.m."), TM_AMPM, 1, kw_set_ampm }, > + { N_("st"), TM_NONE, 0, kw_set_ordinal }, > + { N_("nd"), TM_NONE, 0, kw_set_ordinal }, > + { N_("rd"), TM_NONE, 0, kw_set_ordinal }, > + { N_("th"), TM_NONE, 0, kw_set_ordinal }, > + > + /* Timezone codes: offset in minutes. XXX: Add more codes. */ > + { N_("pst"), TM_TZ, -8*60, NULL }, > + { N_("mst"), TM_TZ, -7*60, NULL }, > + { N_("cst"), TM_TZ, -6*60, NULL }, > + { N_("est"), TM_TZ, -5*60, NULL }, > + { N_("ast"), TM_TZ, -4*60, NULL }, > + { N_("nst"), TM_TZ, -(3*60+30), NULL }, > + > + { N_("gmt"), TM_TZ, 0, NULL }, > + { N_("utc"), TM_TZ, 0, NULL }, > + > + { N_("wet"), TM_TZ, 0, NULL }, > + { N_("cet"), TM_TZ, 1*60, NULL }, > + { N_("eet"), TM_TZ, 2*60, NULL }, > + { N_("fet"), TM_TZ, 3*60, NULL }, > + > + { N_("wat"), TM_TZ, 1*60, NULL }, > + { N_("cat"), TM_TZ, 2*60, NULL }, > + { N_("eat"), TM_TZ, 3*60, NULL }, > +}; > + > +/* > + * Compare strings s and keyword. Return number of matching chars on > + * match, 0 for no match. Match must be at least n chars, or all of > + * keyword if n < 0, otherwise it's not a match. Use match_case for > + * case sensitive matching. > + */ > +static size_t > +match_keyword (const char *s, const char *keyword, ssize_t n, bool match_case) > +{ > + ssize_t i; > + > + if (!n) > + return 0; > + > + for (i = 0; *s && *keyword; i++, s++, keyword++) { > + if (match_case) { > + if (*s != *keyword) The pointer arithmetic doesn't seem to buy anything here. What about just looping over i and using s[i] and keyword[i]? > + break; > + } else { > + if (tolower ((unsigned char) *s) != > + tolower ((unsigned char) *keyword)) I don't think the cast to unsigned char is necessary. > + break; > + } > + } > + > + if (n > 0) > + return i < n ? 0 : i; > + else > + return *keyword ? 0 : i; > +} > + > +/* > + * Parse a keyword. Return < 0 on error, number of parsed chars on > + * success. > + */ > +static ssize_t > +parse_keyword (struct state *state, const char *s) > +{ > + unsigned int i; > + size_t n, max_n = 0; > + struct keyword *kw = NULL; > + int r; > + > + /* Match longest keyword */ > + for (i = 0; i < ARRAY_SIZE (keywords); i++) { > + /* Match case if keyword begins with upper case letter. */ > + bool mcase = isupper ((unsigned char) keywords[i].name[0]); Same with this cast. > + ssize_t minlen = -1; > + char keyword[128]; > + char *p; > + > + strncpy (keyword, _(keywords[i].name), sizeof (keyword)); > + > + /* Truncate too long keywords. XXX: Make this dynamic? */ > + keyword[sizeof (keyword) - 1] = '\0'; > + > + /* Minimum match length. */ > + p = strchr (keyword, '|'); > + if (p) { > + minlen = p - keyword; > + > + /* Remove the minimum match length separator. */ > + memmove (p, p + 1, strlen (p + 1) + 1); > + } Would it make more sense to make match_keyword aware of the | character? Then you wouldn't need this dance with copying the keyword into a scratch buffer. I'm thinking something like (untested) static size_t match_keyword (const char *s, const char *keyword, bool match_case) { size_t i; bool prefix_matched = false; for (i = 0; *s && *keyword; i++, s++, keyword++) { if (*keyword == '|') { prefix_matched = true; ++keyword; } if (match_case && *s != *keyword) return 0; else if (tolower (*s) != tolower (*keyword)) return 0; } if (!*keyword || prefix_matched) return i; return 0; } > + > + n = match_keyword (s, keyword, minlen, mcase); > + if (n > max_n || (n == max_n && mcase)) { > + max_n = n; > + kw = &keywords[i]; > + } > + } > + > + if (!kw) > + return -PARSE_TIME_ERR_KEYWORD; > + > + if (kw->set) > + r = kw->set (state, kw); > + else > + r = kw_set_default (state, kw); > + > + if (r < 0) > + return r; > + > + return max_n; > +} > + > +/* > + * Non-keyword parsers and their helpers. > + */ > + > +static int > +set_user_tz (struct state *state, char sign, int hour, int min) > +{ > + int tz = hour * 60 + min; > + > + assert (sign == '+' || sign == '-'); > + > + if (hour < 0 || hour > 14 || min < 0 || min > 59 || min % 15) Good to see you're not forgetting our Kiribati notmuch user base. > + return -PARSE_TIME_ERR_INVALIDTIME; > + > + if (sign == '-') > + tz = -tz; > + > + return set_field (state, TM_TZ, tz); > +} > + > +/* > + * Parse a previously postponed number if one exists. Independent > + * parsing of a postponed number when it wasn't consumed during > + * parsing of the following token. > + */ > +static int > +parse_postponed_number (struct state *state, unused (enum field next_field)) > +{ > + int v, n; > + char d; > + > + /* Bail out if there's no postponed number. */ > + if (!get_postponed_number (state, &v, &n, &d)) > + return 0; > + > + if (n == 1 || n == 2) { > + /* Notable exception: Previous field affects parsing. This > + * handles "January 20". */ > + if (state->last_field == TM_ABS_MON) { > + /* D[D] */ > + if (!is_valid_mday (v)) > + return -PARSE_TIME_ERR_INVALIDDATE; > + > + return set_field (state, TM_ABS_MDAY, v); > + } else if (n == 2) { > + /* XXX: Only allow if last field is hour, min, or sec? */ > + if (d == '+' || d == '-') { > + /* +/-HH */ > + return set_user_tz (state, d, v, 0); > + } > + } > + } else if (n == 4) { > + /* Notable exception: Value affects parsing. Time zones are > + * always at most 1400 and we don't understand years before > + * 1970. */ > + if (!is_valid_year (v)) { > + if (d == '+' || d == '-') { > + /* +/-HHMM */ > + return set_user_tz (state, d, v / 100, v % 100); > + } > + } else { > + /* YYYY */ > + return set_field (state, TM_ABS_YEAR, v); > + } > + } else if (n == 6) { > + /* HHMMSS */ > + int hour = v / 10000; > + int min = (v / 100) % 100; > + int sec = v % 100; > + > + if (!is_valid_time (hour, min, sec)) > + return -PARSE_TIME_ERR_INVALIDTIME; > + > + return set_abs_time (state, hour, min, sec); > + } else if (n == 8) { > + /* YYYYMMDD */ > + int year = v / 10000; > + int mon = (v / 100) % 100; > + int mday = v % 100; > + > + if (!is_valid_date (year, mon, mday)) > + return -PARSE_TIME_ERR_INVALIDDATE; > + > + return set_abs_date (state, year, mon, mday); > + } else { > + return -PARSE_TIME_ERR_FORMAT; No need for the else block, given the return at the end. > + } > + > + return -PARSE_TIME_ERR_FORMAT; > +} > + > +static int tm_get_field (const struct tm *tm, enum field field); > + > +static int > +set_timestamp (struct state *state, time_t t) > +{ > + struct tm tm; > + enum field f; > + int r; > + > + if (gmtime_r (&t, &tm) == NULL) > + return -PARSE_TIME_ERR_LIB; > + > + for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) { > + r = set_field (state, f, tm_get_field (&tm, f)); > + if (r) > + return r; > + } > + > + r = set_field (state, TM_TZ, 0); > + if (r) > + return r; > + > + /* XXX: Prevent TM_AMPM with timestamp, e.g. "@123456 pm" */ > + > + return 0; > +} > + > +/* Parse a single number. Typically postpone parsing until later. */ > +static int > +parse_single_number (struct state *state, unsigned long v, > + unsigned long n) > +{ > + assert (n); > + > + if (state->delim == '@') > + return set_timestamp (state, (time_t) v); > + > + if (v > INT_MAX) > + return -PARSE_TIME_ERR_FORMAT; > + > + return set_postponed_number (state, v, n); > +} > + > +static bool > +is_time_sep (char c) > +{ > + return c == ':'; > +} > + > +static bool > +is_date_sep (char c) > +{ > + return c == '/' || c == '-' || c == '.'; > +} > + > +static bool > +is_sep (char c) > +{ > + return is_time_sep (c) || is_date_sep (c); > +} > + > +/* Two-digit year: 00...69 is 2000s, 70...99 1900s, if n == 0 keep > + * unset. */ > +static int > +expand_year (unsigned long year, size_t n) > +{ > + if (n == 2) { > + return (year < 70 ? 2000 : 1900) + year; > + } else if (n == 4) { > + return year; > + } else { > + return UNSET; > + } > +} > + > +/* Parse a date number triplet. */ > +static int > +parse_date (struct state *state, char sep, > + unsigned long v1, unsigned long v2, unsigned long v3, > + size_t n1, size_t n2, size_t n3) > +{ > + int year = UNSET, mon = UNSET, mday = UNSET; > + > + assert (is_date_sep (sep)); > + > + switch (sep) { > + case '/': /* Date: M[M]/D[D][/YY[YY]] or M[M]/YYYY */ > + if (n1 != 1 && n1 != 2) > + return -PARSE_TIME_ERR_DATEFORMAT; > + > + if ((n2 == 1 || n2 == 2) && (n3 == 0 || n3 == 2 || n3 == 4)) { > + /* M[M]/D[D][/YY[YY]] */ > + year = expand_year (v3, n3); > + mon = v1; > + mday = v2; > + } else if (n2 == 4 && n3 == 0) { > + /* M[M]/YYYY */ > + year = v2; > + mon = v1; > + } else { > + return -PARSE_TIME_ERR_DATEFORMAT; > + } > + break; > + > + case '-': /* Date: YYYY-MM[-DD] or DD-MM[-YY[YY]] or MM-YYYY */ > + if (n1 == 4 && n2 == 2 && (n3 == 0 || n3 == 2)) { > + /* YYYY-MM[-DD] */ > + year = v1; > + mon = v2; > + if (n3) > + mday = v3; > + } else if (n1 == 2 && n2 == 2 && (n3 == 0 || n3 == 2 || n3 == 4)) { > + /* DD-MM[-YY[YY]] */ > + year = expand_year (v3, n3); > + mon = v2; > + mday = v1; > + } else if (n1 == 2 && n2 == 4 && n3 == 0) { > + /* MM-YYYY */ > + year = v2; > + mon = v1; > + } else { > + return -PARSE_TIME_ERR_DATEFORMAT; > + } > + break; > + > + case '.': /* Date: D[D].M[M][.[YY[YY]]] */ > + if ((n1 != 1 && n1 != 2) || (n2 != 1 && n2 != 2) || > + (n3 != 0 && n3 != 2 && n3 != 4)) > + return -PARSE_TIME_ERR_DATEFORMAT; > + > + year = expand_year (v3, n3); > + mon = v2; > + mday = v1; > + break; > + } > + > + if (year != UNSET && !is_valid_year (year)) > + return -PARSE_TIME_ERR_INVALIDDATE; > + > + if (mon != UNSET && !is_valid_mon (mon)) > + return -PARSE_TIME_ERR_INVALIDDATE; > + > + if (mday != UNSET && !is_valid_mday (mday)) > + return -PARSE_TIME_ERR_INVALIDDATE; > + > + return set_abs_date (state, year, mon, mday); > +} > + > +/* Parse a time number triplet. */ > +static int > +parse_time (struct state *state, char sep, > + unsigned long v1, unsigned long v2, unsigned long v3, > + size_t n1, size_t n2, size_t n3) > +{ > + assert (is_time_sep (sep)); > + > + if ((n1 != 1 && n1 != 2) || n2 != 2 || (n3 != 0 && n3 != 2)) > + return -PARSE_TIME_ERR_TIMEFORMAT; > + > + /* > + * Notable exception: Previously set fields affect > + * parsing. Interpret (+|-)HH:MM as time zone only if hour and > + * minute have been set. > + * > + * XXX: This could be fixed by restricting the delimiters > + * preceding time. For '+' it would be justified, but for '-' it > + * might be inconvenient. However prefer to allow '-' as an > + * insignificant delimiter preceding time for convenience, and > + * handle '+' the same way for consistency between positive and > + * negative time zones. > + */ > + if (is_field_set (state, TM_ABS_HOUR) && > + is_field_set (state, TM_ABS_MIN) && > + n1 == 2 && n2 == 2 && n3 == 0 && > + (state->delim == '+' || state->delim == '-')) { > + return set_user_tz (state, state->delim, v1, v2); > + } > + > + if (!is_valid_time (v1, v2, v3)) > + return -PARSE_TIME_ERR_INVALIDTIME; > + > + return set_abs_time (state, v1, v2, n3 ? v3 : 0); > +} > + > +/* strtoul helper that assigns length. */ > +static unsigned long > +strtoul_len (const char *s, const char **endp, size_t *len) > +{ > + unsigned long val = strtoul (s, (char **) endp, 10); This could technically get confused by really large numbers, but I don't know if that's worth worrying about. > + > + *len = *endp - s; > + return val; > +} > + > +/* > + * Parse a (group of) number(s). Return < 0 on error, number of parsed > + * chars on success. > + */ > +static ssize_t > +parse_number (struct state *state, const char *s) > +{ > + int r; > + unsigned long v1, v2, v3 = 0; > + size_t n1, n2, n3 = 0; > + const char *p = s; > + char sep; > + > + v1 = strtoul_len (p, &p, &n1); > + > + if (is_sep (*p) && isdigit ((unsigned char) *(p + 1))) { Unnecessary cast? > + sep = *p; > + v2 = strtoul_len (p + 1, &p, &n2); > + } else { > + /* A single number. */ > + r = parse_single_number (state, v1, n1); > + if (r) > + return r; > + > + return p - s; I found the control flow here confusing. You might want to flip the two conditions so the single number return happens first and the rest of the code flows straight through: if (!is_sep (*p) || !isdigit (*(p + 1))) { ... return p - s; } sep = *p; ... > + } > + > + /* A group of two or three numbers? */ > + if (*p == sep && isdigit ((unsigned char) *(p + 1))) > + v3 = strtoul_len (p + 1, &p, &n3); > + > + if (is_time_sep (sep)) > + r = parse_time (state, sep, v1, v2, v3, n1, n2, n3); > + else > + r = parse_date (state, sep, v1, v2, v3, n1, n2, n3); > + > + if (r) > + return r; > + > + return p - s; > +} > + > +/* > + * Parse delimiter(s). Throw away all except the last one, which is > + * stored for parsing the next non-delimiter. Return < 0 on error, > + * number of parsed chars on success. > + * > + * XXX: We might want to be more strict here. > + */ > +static ssize_t > +parse_delim (struct state *state, const char *s) > +{ > + const char *p = s; > + > + /* > + * Skip non-alpha and non-digit, and store the last for further > + * processing. > + */ > + while (*p && !isalnum ((unsigned char) *p)) { > + set_delim (state, *p); > + p++; > + } > + > + return p - s; > +} > + > +/* > + * Parse a date/time string. Return < 0 on error, number of parsed > + * chars on success. > + */ > +static ssize_t > +parse_input (struct state *state, const char *s) > +{ > + const char *p = s; > + ssize_t n; > + int r; > + > + while (*p) { > + if (isalpha ((unsigned char) *p)) { > + n = parse_keyword (state, p); > + } else if (isdigit ((unsigned char) *p)) { > + n = parse_number (state, p); > + } else { > + n = parse_delim (state, p); > + } > + > + if (n <= 0) { > + if (n == 0) > + n = -PARSE_TIME_ERR; > + > + return n; > + } > + > + p += n; > + } > + > + /* Parse a previously postponed number, if any. */ > + r = parse_postponed_number (state, TM_NONE); > + if (r < 0) > + return r; > + > + return p - s; > +} > + > +/* > + * Processing the parsed input. > + */ > + > +/* > + * Initialize reference time to tm. Use time zone in state if > + * specified, otherwise local time. Use now for reference time if > + * non-NULL, otherwise current time. > + */ > +static int > +initialize_now (struct state *state, struct tm *tm, const time_t *now) Should tm be the last argument, since it's an out-argument? Why is now a pointer? Just so it can be NULL? > +{ > + time_t t; > + > + if (now) { > + t = *now; > + } else { > + if (time (&t) == (time_t) -1) > + return -PARSE_TIME_ERR_LIB; > + } > + > + if (is_field_set (state, TM_TZ)) { > + /* Some other time zone. */ > + > + /* Adjust now according to the TZ. */ > + t += get_field (state, TM_TZ) * 60; > + > + /* It's not gm, but this doesn't mess with the TZ. */ > + if (gmtime_r (&t, tm) == NULL) > + return -PARSE_TIME_ERR_LIB; > + } else { > + /* Local time. */ > + if (localtime_r (&t, tm) == NULL) > + return -PARSE_TIME_ERR_LIB; > + } > + > + return 0; > +} > + > +/* > + * Normalize tm according to mktime(3). Both mktime(3) and This comment could elaborate a bit on what it means to normalize a tm. > + * localtime_r(3) use local time, but they cancel each other out here, > + * making this function agnostic to time zone. > + */ > +static int > +normalize_tm (struct tm *tm) > +{ > + time_t t = mktime (tm); > + > + if (t == (time_t) -1) > + return -PARSE_TIME_ERR_LIB; > + > + if (!localtime_r (&t, tm)) > + return -PARSE_TIME_ERR_LIB; Do you actually need this call to localtime_r or can you just return after the mktime modifies tm? Does this have to do with timezones? > + > + return 0; > +} > + > +/* Get field out of a struct tm. */ > +static int > +tm_get_field (const struct tm *tm, enum field field) > +{ > + switch (field) { > + case TM_ABS_SEC: return tm->tm_sec; > + case TM_ABS_MIN: return tm->tm_min; > + case TM_ABS_HOUR: return tm->tm_hour; > + case TM_ABS_MDAY: return tm->tm_mday; > + case TM_ABS_MON: return tm->tm_mon + 1; /* 0- to 1-based */ > + case TM_ABS_YEAR: return 1900 + tm->tm_year; > + case TM_ABS_WDAY: return tm->tm_wday; > + case TM_ABS_ISDST: return tm->tm_isdst; > + default: > + assert (false); > + break; > + } > + > + return 0; > +} > + > +/* Modify hour according to am/pm setting. */ > +static int > +fixup_ampm (struct state *state) > +{ > + int hour, hdiff = 0; > + > + if (!is_field_set (state, TM_AMPM)) > + return 0; > + > + if (!is_field_set (state, TM_ABS_HOUR)) > + return -PARSE_TIME_ERR_TIMEFORMAT; > + > + hour = get_field (state, TM_ABS_HOUR); > + if (!is_valid_12hour (hour)) > + return -PARSE_TIME_ERR_INVALIDTIME; > + > + if (get_field (state, TM_AMPM)) { > + /* 12pm is noon. */ > + if (hour != 12) > + hdiff = 12; > + } else { > + /* 12am is midnight, beginning of day. */ > + if (hour == 12) > + hdiff = -12; > + } > + > + mod_field (state, TM_REL_HOUR, -hdiff); > + > + return 0; > +} > + > +/* Combine absolute and relative fields, and round. */ > +static int > +create_output (struct state *state, time_t *t_out, const time_t *ref, > + int round) > +{ > + struct tm tm = { .tm_isdst = -1 }; > + struct tm now; > + time_t t; > + enum field f; > + int r; > + int week_round = PARSE_TIME_NO_ROUND; > + > + r = initialize_now (state, &now, ref); > + if (r) > + return r; > + > + /* Initialize fields flagged as "now" to reference time. */ > + for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) { > + if (state->set[f] == FIELD_NOW) { > + state->tm[f] = tm_get_field (&now, f); > + state->set[f] = FIELD_SET; > + } > + } > + > + /* > + * If WDAY is set but MDAY is not, we consider WDAY relative > + * > + * XXX: This fails on stuff like "two months monday" because two > + * months ago wasn't the same day as today. Postpone until we know > + * date? > + */ > + if (is_field_set (state, TM_ABS_WDAY) && > + !is_field_set (state, TM_ABS_MDAY)) { > + int wday = get_field (state, TM_ABS_WDAY); > + int today = tm_get_field (&now, TM_ABS_WDAY); > + int rel_days; > + > + if (today > wday) > + rel_days = today - wday; > + else > + rel_days = today + 7 - wday; > + > + /* This also prevents special week rounding from happening. */ > + mod_field (state, TM_REL_DAY, rel_days); > + > + unset_field (state, TM_ABS_WDAY); > + } > + > + r = fixup_ampm (state); > + if (r) > + return r; > + > + /* > + * Iterate fields from most accurate to least accurate, and set > + * unset fields according to requested rounding. > + */ > + for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) { > + if (round != PARSE_TIME_NO_ROUND) { > + enum field r = abs_to_rel_field (f); > + > + if (is_field_set (state, f) || is_field_set (state, r)) { > + if (round >= PARSE_TIME_ROUND_UP && f != TM_ABS_SEC) { > + mod_field (state, r, -1); Crazy. This could use a comment. It took me a while to figure out why this was -1, though maybe that's just because it's late. > + if (round == PARSE_TIME_ROUND_UP_INCLUSIVE) > + mod_field (state, TM_REL_SEC, 1); > + } > + round = PARSE_TIME_NO_ROUND; /* No more rounding. */ > + } else { > + if (f == TM_ABS_MDAY && > + is_field_set (state, TM_REL_WEEK)) { > + /* Week is most accurate. */ > + week_round = round; > + round = PARSE_TIME_NO_ROUND; > + } else { > + set_field (state, f, field_epoch (f)); > + } > + } > + } > + > + if (!is_field_set (state, f)) > + set_field (state, f, tm_get_field (&now, f)); > + } > + > + /* Special case: rounding with week accuracy. */ > + if (week_round != PARSE_TIME_NO_ROUND) { > + /* Temporarily set more accurate fields to now. */ > + set_field (state, TM_ABS_SEC, tm_get_field (&now, TM_ABS_SEC)); > + set_field (state, TM_ABS_MIN, tm_get_field (&now, TM_ABS_MIN)); > + set_field (state, TM_ABS_HOUR, tm_get_field (&now, TM_ABS_HOUR)); > + set_field (state, TM_ABS_MDAY, tm_get_field (&now, TM_ABS_MDAY)); > + } > + > + /* > + * Set all fields. They may contain out of range values before > + * normalization by mktime(3). > + */ > + tm.tm_sec = get_field (state, TM_ABS_SEC) - get_field (state, TM_REL_SEC); > + tm.tm_min = get_field (state, TM_ABS_MIN) - get_field (state, TM_REL_MIN); > + tm.tm_hour = get_field (state, TM_ABS_HOUR) - get_field (state, TM_REL_HOUR); > + tm.tm_mday = get_field (state, TM_ABS_MDAY) - > + get_field (state, TM_REL_DAY) - 7 * get_field (state, TM_REL_WEEK); > + tm.tm_mon = get_field (state, TM_ABS_MON) - get_field (state, TM_REL_MON); > + tm.tm_mon--; /* 1- to 0-based */ > + tm.tm_year = get_field (state, TM_ABS_YEAR) - get_field (state, TM_REL_YEAR) - 1900; > + > + /* > + * It's always normal time. > + * > + * XXX: This is probably not a solution that universally > + * works. Just make sure DST is not taken into account. We don't > + * want rounding to be affected by DST. > + */ > + tm.tm_isdst = -1; > + > + /* Special case: rounding with week accuracy. */ > + if (week_round != PARSE_TIME_NO_ROUND) { > + /* Normalize to get proper tm.wday. */ > + r = normalize_tm (&tm); > + if (r < 0) > + return r; > + > + /* Set more accurate fields back to zero. */ > + tm.tm_sec = 0; > + tm.tm_min = 0; > + tm.tm_hour = 0; > + tm.tm_isdst = -1; > + > + /* Monday is the true 1st day of week, but this is easier. */ > + if (week_round >= PARSE_TIME_ROUND_UP) { > + tm.tm_mday += 7 - tm.tm_wday; > + if (week_round == PARSE_TIME_ROUND_UP_INCLUSIVE) > + tm.tm_sec--; > + } else { > + tm.tm_mday -= tm.tm_wday; > + } > + } > + > + if (is_field_set (state, TM_TZ)) { > + /* tm is in specified TZ, convert to UTC for timegm(3). */ > + tm.tm_min -= get_field (state, TM_TZ); > + t = timegm (&tm); > + } else { > + /* tm is in local time. */ > + t = mktime (&tm); > + } > + > + if (t == (time_t) -1) > + return -PARSE_TIME_ERR_LIB; > + > + *t_out = t; > + > + return 0; > +} > + > +/* Internally, all errors are < 0. parse_time_string() returns errors > 0. */ > +#define EXTERNAL_ERR(r) (-r) > + > +int > +parse_time_string (const char *s, time_t *t, const time_t *ref, int round) > +{ > + struct state state = { .last_field = TM_NONE }; > + int r; > + > + if (!s || !t) > + return EXTERNAL_ERR (-PARSE_TIME_ERR); > + > + r = parse_input (&state, s); > + if (r < 0) > + return EXTERNAL_ERR (r); > + > + r = create_output (&state, t, ref, round); > + if (r < 0) > + return EXTERNAL_ERR (r); > + > + return 0; > +} > diff --git a/parse-time-string/parse-time-string.h b/parse-time-string/parse-time-string.h > new file mode 100644 > index 0000000..bfa4ee3 > --- /dev/null > +++ b/parse-time-string/parse-time-string.h > @@ -0,0 +1,102 @@ > +/* > + * parse time string - user friendly date and time parser > + * Copyright © 2012 Jani Nikula > + * > + * This program is free software: you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation, either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program. If not, see . > + * > + * Author: Jani Nikula > + */ > + > +#ifndef PARSE_TIME_STRING_H > +#define PARSE_TIME_STRING_H > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +#include > + > +/* return values for parse_time_string() */ > +enum { > + PARSE_TIME_OK = 0, > + PARSE_TIME_ERR, /* unspecified error */ > + PARSE_TIME_ERR_LIB, /* library call failed */ > + PARSE_TIME_ERR_ALREADYSET, /* attempt to set unit twice */ > + PARSE_TIME_ERR_FORMAT, /* generic date/time format error */ > + PARSE_TIME_ERR_DATEFORMAT, /* date format error */ > + PARSE_TIME_ERR_TIMEFORMAT, /* time format error */ > + PARSE_TIME_ERR_INVALIDDATE, /* date value error */ > + PARSE_TIME_ERR_INVALIDTIME, /* time value error */ > + PARSE_TIME_ERR_KEYWORD, /* unknown keyword */ > +}; > + > +/* round values for parse_time_string() */ > +enum { > + PARSE_TIME_ROUND_DOWN = -1, > + PARSE_TIME_NO_ROUND = 0, > + PARSE_TIME_ROUND_UP = 1, > + PARSE_TIME_ROUND_UP_INCLUSIVE = 2, > +}; > + > +/** > + * parse_time_string() - user friendly date and time parser > + * @s: string to parse > + * @t: pointer to time_t to store parsed time in > + * @ref: pointer to time_t containing reference date/time, or NULL > + * @round: PARSE_TIME_NO_ROUND, PARSE_TIME_ROUND_DOWN, or > + * PARSE_TIME_ROUND_UP > + * > + * Parse a date/time string 's' and store the parsed date/time result > + * in 't'. > + * > + * A reference date/time is used for determining the "date/time units" > + * (roughly equivalent to struct tm members) not specified by 's'. If > + * 'ref' is non-NULL, it must contain a pointer to a time_t to be used > + * as reference date/time. Otherwise, the current time is used. > + * > + * If 's' does not specify a full date/time, the 'round' parameter > + * specifies if and how the result should be rounded as follows: > + * > + * PARSE_TIME_NO_ROUND: All date/time units that are not specified > + * by 's' are set to the corresponding unit derived from the > + * reference date/time. > + * > + * PARSE_TIME_ROUND_DOWN: All date/time units that are more accurate > + * than the most accurate unit specified by 's' are set to the > + * smallest valid value for that unit. Rest of the unspecified units > + * are set as in PARSE_TIME_NO_ROUND. > + * > + * PARSE_TIME_ROUND_UP: All date/time units that are more accurate > + * than the most accurate unit specified by 's' are set to the > + * smallest valid value for that unit. The most accurate unit > + * specified by 's' is incremented by one (and this is rolled over > + * to the less accurate units as necessary), unless the most > + * accurate unit is seconds. Rest of the unspecified units are set > + * as in PARSE_TIME_NO_ROUND. > + * > + * PARSE_TIME_ROUND_UP_INCLUSIVE: Same as PARSE_TIME_ROUND_UP, minus > + * one second, unless the most accurate unit specified by 's' is > + * seconds. This is useful for callers that require a value for > + * inclusive comparison of the result. > + * > + * Return 0 (PARSE_TIME_OK) for succesfully parsed date/time, or one > + * of PARSE_TIME_ERR_* on error. 't' is not modified on error. > + */ > +int parse_time_string (const char *s, time_t *t, const time_t *ref, int round); > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* PARSE_TIME_STRING_H */ Made it!