Re: [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch

author Austin Clements <amdragon@MIT.EDU>

Sun, 28 Oct 2012 22:52:04 +0000 (18:52 +2000)

committer W. Trevor King <wking@tremily.us>

Fri, 7 Nov 2014 17:50:09 +0000 (09:50 -0800)
author Austin Clements <amdragon@MIT.EDU>
Sun, 28 Oct 2012 22:52:04 +0000 (18:52 +2000)
committer W. Trevor King <wking@tremily.us>
Fri, 7 Nov 2014 17:50:09 +0000 (09:50 -0800)
diff --git a/63/b255f78c075e1d24211eb58e93906e99709a1a b/63/b255f78c075e1d24211eb58e93906e99709a1a

new file mode 100644 (file)

index 0000000..364159c
--- /dev/null
+++ b/63/b255f78c075e1d24211eb58e93906e99709a1a
@@ -0,0 +1,333 @@
+Return-Path: <amdragon@mit.edu>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+       by olra.theworths.org (Postfix) with ESMTP id 8413F431FBC\r
+       for <notmuch@notmuchmail.org>; Sun, 28 Oct 2012 15:52:08 -0700 (PDT)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.7\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5\r
+       tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+       by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+       with ESMTP id oFcnPjW5vS0I for <notmuch@notmuchmail.org>;\r
+       Sun, 28 Oct 2012 15:52:07 -0700 (PDT)\r
+Received: from dmz-mailsec-scanner-6.mit.edu (DMZ-MAILSEC-SCANNER-6.MIT.EDU\r
+       [18.7.68.35])\r
+       by olra.theworths.org (Postfix) with ESMTP id 5D186431FAF\r
+       for <notmuch@notmuchmail.org>; Sun, 28 Oct 2012 15:52:07 -0700 (PDT)\r
+X-AuditID: 12074423-b7fab6d0000008f9-58-508db716ff39\r
+Received: from mailhub-auth-3.mit.edu ( [18.9.21.43])\r
+       by dmz-mailsec-scanner-6.mit.edu (Symantec Messaging Gateway) with SMTP\r
+       id BD.39.02297.617BD805; Sun, 28 Oct 2012 18:52:06 -0400 (EDT)\r
+Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103])\r
+       by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id q9SMq6j5020192; \r
+       Sun, 28 Oct 2012 18:52:06 -0400\r
+Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91])\r
+       (authenticated bits=0)\r
+       (User authenticated as amdragon@ATHENA.MIT.EDU)\r
+       by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id q9SMq4nR026482\r
+       (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT);\r
+       Sun, 28 Oct 2012 18:52:05 -0400 (EDT)\r
+Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77)\r
+       (envelope-from <amdragon@mit.edu>)\r
+       id 1TSbi8-0002Wc-7x; Sun, 28 Oct 2012 18:52:04 -0400\r
+Date: Sun, 28 Oct 2012 18:52:04 -0400\r
+From: Austin Clements <amdragon@MIT.EDU>\r
+To: Jani Nikula <jani@nikula.org>\r
+Subject: Re: [PATCH v5 2/9] parse-time-string: add a date/time parser to\r
+       notmuch\r
+Message-ID: <20121028225204.GD15377@mit.edu>\r
+References: <cover.1350854171.git.jani@nikula.org>\r
+       <a90d3b687895a26f765539d6c0420038a74ee42f.1350854171.git.jani@nikula.org>\r
+       <20121022081444.GM14861@mit.edu> <87lieqkz4t.fsf@nikula.org>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain; charset=us-ascii\r
+Content-Disposition: inline\r
+In-Reply-To: <87lieqkz4t.fsf@nikula.org>\r
+User-Agent: Mutt/1.5.21 (2010-09-15)\r
+X-Brightmail-Tracker:\r
+ H4sIAAAAAAAAA+NgFmpileLIzCtJLcpLzFFi42IR4hTV1hXb3htgsP2vuUXTdGeL6zdnMjsw\r
+       edy6/5rd49mqW8wBTFFcNimpOZllqUX6dglcGfOuvWAv+BdYsecOewPjNNsuRk4OCQETicXf\r
+       bjNC2GISF+6tZ+ti5OIQEtjHKNH7/hwLhLOBUaJ/yTN2COckk8S7/49ZQVqEBJYwSsx8VgFi\r
+       swioSixd9RIsziagIbFt/3KwsSICihKbT+4Hs5kFpCW+/W5mArGFBYIkfr2/ywJi8wroSPTM\r
+       W8oEsWA/o8T0qVfZIRKCEidnPmGBaNaSuPHvJVARB9ig5f84QMKcQLu2P3/JDGKLCqhITDm5\r
+       jW0Co9AsJN2zkHTPQuhewMi8ilE2JbdKNzcxM6c4NVm3ODkxLy+1SNdMLzezRC81pXQTIyio\r
+       2V2UdzD+Oah0iFGAg1GJh/dCQU+AEGtiWXFl7iFGSQ4mJVHeV2t7A4T4kvJTKjMSizPii0pz\r
+       UosPMUpwMCuJ8C7lBsrxpiRWVqUW5cOkpDlYlMR5r6Xc9BcSSE8sSc1OTS1ILYLJynBwKEnw\r
+       WmwDahQsSk1PrUjLzClBSDNxcIIM5wEa/mwLyPDigsTc4sx0iPwpRkUpcd53W4ESAiCJjNI8\r
+       uF5Y0nnFKA70ijCvJsgKHmDCgut+BTSYCWiwDh/Y4JJEhJRUA2NJ7u2trqsEOc/PrFcxUTz1\r
+       f1P3leOrrxVNEw13d4x8tLtA1XDLChOdsD1X5N6fnPBVtO7FL52rBRkLzi59smZC5T7WY9sf\r
+       arDstVsf/61/6rk/hruXx5zykjBwt19uxy6f5nPxuv5srUWl7K1Tpngo3p2br12ffsXh2b2O\r
+       77aPFHcrrUkMeVKlxFKckWioxVxUnAgAEEZ4sxUDAAA=\r
+Cc: notmuch@notmuchmail.org\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+       <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+       <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+       <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Sun, 28 Oct 2012 22:52:08 -0000\r
+\r
+Quoth Jani Nikula on Oct 29 at 12:30 am:\r
+> On Mon, 22 Oct 2012, Austin Clements <amdragon@MIT.EDU> wrote:\r
+> >> +/*\r
+> >> + * Accepted keywords.\r
+> >> + *\r
+> >> + * A keyword may optionally contain a '|' to indicate the minimum\r
+> >> + * match length. Without one, full match is required. It's advisable\r
+> >> + * to keep the minimum match parts unique across all keywords.\r
+> >> + *\r
+> >> + * If keyword begins with upper case letter, then the matching will be\r
+> >> + * case sensitive. Otherwise the matching is case insensitive.\r
+> >> + *\r
+> >> + * If setter is NULL, set_default will be used.\r
+> >> + *\r
+> >> + * Note: Order matters. Matching is greedy, longest match is used, but\r
+> >> + * of equal length matches the first one is used, unless there's an\r
+> >> + * equal length case sensitive match which trumps case insensitive\r
+> >> + * matches.\r
+> >\r
+> > If you do have a tokenizer (or disallow mashing keywords together),\r
+> > then all of complexity arising from longest match goes away because\r
+> > the keyword token either will or won't match a keyword.  If you also\r
+> > eliminate the rule for case sensitivity and put case-sensitive things\r
+> > before conflicting case-insensitive things (so put "M" before\r
+> > "m|inutes"), then you can simply use the first match.\r
+> \r
+> At least one reason for going through the whole table is that if this\r
+> ever gets i18n support, the conflicting things might be different. While\r
+> order matters in principle, you should create the table so that it\r
+> really doesn't matter.\r
+\r
+While that's true, if the input keyword has to be syntactically\r
+delimited, there's still no such thing as a "longest match", since the\r
+length of any match will be the length of the input.  You may still\r
+want to scan the whole table, but if you find multiple matches, it's a\r
+bug in the table indicating that |ed prefixes aren't unique.  Hence,\r
+if you're not interested in finding bugs in the table, you can just\r
+find the first match.\r
+\r
+Or you could remove the |'s from the table, scan the whole table, and\r
+consider the input string ambiguous if it matches multiple table\r
+entries (being careful with case sensitivity), just like you do now if\r
+the input string is shorter than the |ed prefixes.  That would\r
+simplify your table, your matching logic, and possibly your scanning\r
+logic.\r
+\r
+> >\r
+> >> + */\r
+> >> +static struct keyword keywords[] = {\r
+> >> +    /* Weekdays. */\r
+> >> +    { N_("sun|day"),     TM_ABS_WDAY,    0,      NULL },\r
+> >> +    { N_("mon|day"),     TM_ABS_WDAY,    1,      NULL },\r
+> >> +    { N_("tue|sday"),    TM_ABS_WDAY,    2,      NULL },\r
+> >> +    { N_("wed|nesday"),  TM_ABS_WDAY,    3,      NULL },\r
+> >> +    { N_("thu|rsday"),   TM_ABS_WDAY,    4,      NULL },\r
+> >> +    { N_("fri|day"),     TM_ABS_WDAY,    5,      NULL },\r
+> >> +    { N_("sat|urday"),   TM_ABS_WDAY,    6,      NULL },\r
+> >> +\r
+> >> +    /* Months. */\r
+> >> +    { N_("jan|uary"),    TM_ABS_MON,     1,      kw_set_month },\r
+> >> +    { N_("feb|ruary"),   TM_ABS_MON,     2,      kw_set_month },\r
+> >> +    { N_("mar|ch"),      TM_ABS_MON,     3,      kw_set_month },\r
+> >> +    { N_("apr|il"),      TM_ABS_MON,     4,      kw_set_month },\r
+> >> +    { N_("may"), TM_ABS_MON,     5,      kw_set_month },\r
+> >> +    { N_("jun|e"),       TM_ABS_MON,     6,      kw_set_month },\r
+> >> +    { N_("jul|y"),       TM_ABS_MON,     7,      kw_set_month },\r
+> >> +    { N_("aug|ust"),     TM_ABS_MON,     8,      kw_set_month },\r
+> >> +    { N_("sep|tember"),  TM_ABS_MON,     9,      kw_set_month },\r
+> >> +    { N_("oct|ober"),    TM_ABS_MON,     10,     kw_set_month },\r
+> >> +    { N_("nov|ember"),   TM_ABS_MON,     11,     kw_set_month },\r
+> >> +    { N_("dec|ember"),   TM_ABS_MON,     12,     kw_set_month },\r
+> >> +\r
+> >> +    /* Durations. */\r
+> >> +    { N_("y|ears"),      TM_REL_YEAR,    1,      kw_set_rel },\r
+> >> +    { N_("w|eeks"),      TM_REL_WEEK,    1,      kw_set_rel },\r
+> >> +    { N_("d|ays"),       TM_REL_DAY,     1,      kw_set_rel },\r
+> >> +    { N_("h|ours"),      TM_REL_HOUR,    1,      kw_set_rel },\r
+> >> +    { N_("hr|s"),        TM_REL_HOUR,    1,      kw_set_rel },\r
+> >> +    { N_("m|inutes"),    TM_REL_MIN,     1,      kw_set_rel },\r
+> >> +    /* M=months, m=minutes */\r
+> >> +    { N_("M"),           TM_REL_MON,     1,      kw_set_rel },\r
+> >> +    { N_("mins"),        TM_REL_MIN,     1,      kw_set_rel },\r
+> >> +    { N_("mo|nths"),     TM_REL_MON,     1,      kw_set_rel },\r
+> >> +    { N_("s|econds"),    TM_REL_SEC,     1,      kw_set_rel },\r
+> >> +    { N_("secs"),        TM_REL_SEC,     1,      kw_set_rel },\r
+> >> +\r
+> >> +    /* Numbers. */\r
+> >> +    { N_("one"), TM_NONE,        1,      kw_set_number },\r
+> >> +    { N_("two"), TM_NONE,        2,      kw_set_number },\r
+> >> +    { N_("three"),       TM_NONE,        3,      kw_set_number },\r
+> >> +    { N_("four"),        TM_NONE,        4,      kw_set_number },\r
+> >> +    { N_("five"),        TM_NONE,        5,      kw_set_number },\r
+> >> +    { N_("six"), TM_NONE,        6,      kw_set_number },\r
+> >> +    { N_("seven"),       TM_NONE,        7,      kw_set_number },\r
+> >> +    { N_("eight"),       TM_NONE,        8,      kw_set_number },\r
+> >> +    { N_("nine"),        TM_NONE,        9,      kw_set_number },\r
+> >> +    { N_("ten"), TM_NONE,        10,     kw_set_number },\r
+> >> +    { N_("dozen"),       TM_NONE,        12,     kw_set_number },\r
+> >> +    { N_("hundred"),     TM_NONE,        100,    kw_set_number },\r
+> >> +\r
+> >> +    /* Special number forms. */\r
+> >> +    { N_("this"),        TM_NONE,        0,      kw_set_number },\r
+> >> +    { N_("last"),        TM_NONE,        1,      kw_set_number },\r
+> >> +\r
+> >> +    /* Other special keywords. */\r
+> >> +    { N_("yesterday"),   TM_REL_DAY,     1,      kw_set_rel },\r
+> >> +    { N_("today"),       TM_NONE,        0,      kw_set_today },\r
+> >> +    { N_("now"), TM_NONE,        0,      kw_set_now },\r
+> >> +    { N_("noon"),        TM_NONE,        12,     kw_set_timeofday },\r
+> >> +    { N_("midnight"),    TM_NONE,        0,      kw_set_timeofday },\r
+> >> +    { N_("am"),          TM_AMPM,        0,      kw_set_ampm },\r
+> >> +    { N_("a.m."),        TM_AMPM,        0,      kw_set_ampm },\r
+> >> +    { N_("pm"),          TM_AMPM,        1,      kw_set_ampm },\r
+> >> +    { N_("p.m."),        TM_AMPM,        1,      kw_set_ampm },\r
+> >> +    { N_("st"),          TM_NONE,        0,      kw_set_ordinal },\r
+> >> +    { N_("nd"),          TM_NONE,        0,      kw_set_ordinal },\r
+> >> +    { N_("rd"),          TM_NONE,        0,      kw_set_ordinal },\r
+> >> +    { N_("th"),          TM_NONE,        0,      kw_set_ordinal },\r
+> >> +\r
+> >> +    /* Timezone codes: offset in minutes. XXX: Add more codes. */\r
+> >> +    { N_("pst"), TM_TZ,          -8*60,  NULL },\r
+> >> +    { N_("mst"), TM_TZ,          -7*60,  NULL },\r
+> >> +    { N_("cst"), TM_TZ,          -6*60,  NULL },\r
+> >> +    { N_("est"), TM_TZ,          -5*60,  NULL },\r
+> >> +    { N_("ast"), TM_TZ,          -4*60,  NULL },\r
+> >> +    { N_("nst"), TM_TZ,          -(3*60+30),     NULL },\r
+> >> +\r
+> >> +    { N_("gmt"), TM_TZ,          0,      NULL },\r
+> >> +    { N_("utc"), TM_TZ,          0,      NULL },\r
+> >> +\r
+> >> +    { N_("wet"), TM_TZ,          0,      NULL },\r
+> >> +    { N_("cet"), TM_TZ,          1*60,   NULL },\r
+> >> +    { N_("eet"), TM_TZ,          2*60,   NULL },\r
+> >> +    { N_("fet"), TM_TZ,          3*60,   NULL },\r
+> >> +\r
+> >> +    { N_("wat"), TM_TZ,          1*60,   NULL },\r
+> >> +    { N_("cat"), TM_TZ,          2*60,   NULL },\r
+> >> +    { N_("eat"), TM_TZ,          3*60,   NULL },\r
+> >> +};\r
+> >> +\r
+> >> +/*\r
+> >> + * Compare strings s and keyword. Return number of matching chars on\r
+> >> + * match, 0 for no match. Match must be at least n chars, or all of\r
+> >> + * keyword if n < 0, otherwise it's not a match. Use match_case for\r
+> >> + * case sensitive matching.\r
+> >> + */\r
+> >> +static size_t\r
+> >> +match_keyword (const char *s, const char *keyword, ssize_t n, bool match_case)\r
+> >> +{\r
+> >> +    ssize_t i;\r
+> >> +\r
+> >> +    if (!n)\r
+> >> + return 0;\r
+> >> +\r
+> >> +    for (i = 0; *s && *keyword; i++, s++, keyword++) {\r
+> >> + if (match_case) {\r
+> >> +     if (*s != *keyword)\r
+> >\r
+> > The pointer arithmetic doesn't seem to buy anything here.  What about\r
+> > just looping over i and using s[i] and keyword[i]?\r
+> \r
+> The pointer arithmetic will be useful when I implement your other\r
+> suggestion of handling '|' here. ;) Otherwise, I'd need two index\r
+> variables.\r
+\r
+Fair enough.\r
+\r
+> >\r
+> >> +         break;\r
+> >> + } else {\r
+> >> +     if (tolower ((unsigned char) *s) !=\r
+> >> +         tolower ((unsigned char) *keyword))\r
+> >\r
+> > I don't think the cast to unsigned char is necessary.\r
+> \r
+> As discussed on IRC, pedantically it is necessary, as ctype.h functions\r
+> accept an int that must have the value of an unsigned char or EOF, and\r
+> char might be signed.\r
+\r
+It wouldn't be C without the pedantic.\r
+\r
+> >> +/* Combine absolute and relative fields, and round. */\r
+> >> +static int\r
+> >> +create_output (struct state *state, time_t *t_out, const time_t *ref,\r
+> >> +        int round)\r
+> >> +{\r
+> >> +    struct tm tm = { .tm_isdst = -1 };\r
+> >> +    struct tm now;\r
+> >> +    time_t t;\r
+> >> +    enum field f;\r
+> >> +    int r;\r
+> >> +    int week_round = PARSE_TIME_NO_ROUND;\r
+> >> +\r
+> >> +    r = initialize_now (state, &now, ref);\r
+> >> +    if (r)\r
+> >> + return r;\r
+> >> +\r
+> >> +    /* Initialize fields flagged as "now" to reference time. */\r
+> >> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {\r
+> >> + if (state->set[f] == FIELD_NOW) {\r
+> >> +     state->tm[f] = tm_get_field (&now, f);\r
+> >> +     state->set[f] = FIELD_SET;\r
+> >> + }\r
+> >> +    }\r
+> >> +\r
+> >> +    /*\r
+> >> +     * If WDAY is set but MDAY is not, we consider WDAY relative\r
+> >> +     *\r
+> >> +     * XXX: This fails on stuff like "two months monday" because two\r
+> >> +     * months ago wasn't the same day as today. Postpone until we know\r
+> >> +     * date?\r
+> >> +     */\r
+> >> +    if (is_field_set (state, TM_ABS_WDAY) &&\r
+> >> + !is_field_set (state, TM_ABS_MDAY)) {\r
+> >> + int wday = get_field (state, TM_ABS_WDAY);\r
+> >> + int today = tm_get_field (&now, TM_ABS_WDAY);\r
+> >> + int rel_days;\r
+> >> +\r
+> >> + if (today > wday)\r
+> >> +     rel_days = today - wday;\r
+> >> + else\r
+> >> +     rel_days = today + 7 - wday;\r
+> >> +\r
+> >> + /* This also prevents special week rounding from happening. */\r
+> >> + mod_field (state, TM_REL_DAY, rel_days);\r
+> >> +\r
+> >> + unset_field (state, TM_ABS_WDAY);\r
+> >> +    }\r
+> >> +\r
+> >> +    r = fixup_ampm (state);\r
+> >> +    if (r)\r
+> >> + return r;\r
+> >> +\r
+> >> +    /*\r
+> >> +     * Iterate fields from most accurate to least accurate, and set\r
+> >> +     * unset fields according to requested rounding.\r
+> >> +     */\r
+> >> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {\r
+> >> + if (round != PARSE_TIME_NO_ROUND) {\r
+> >> +     enum field r = abs_to_rel_field (f);\r
+> >> +\r
+> >> +     if (is_field_set (state, f) || is_field_set (state, r)) {\r
+> >> +         if (round >= PARSE_TIME_ROUND_UP && f != TM_ABS_SEC) {\r
+> >> +             mod_field (state, r, -1);\r
+> >\r
+> > Crazy.  This could use a comment.  It took me a while to figure out\r
+> > why this was -1, though maybe that's just because it's late.\r
+> \r
+> Will do.\r
+> \r
+> /* You're not expected to understand this */ ;)\r
+\r
+Hah.  You're not allowed to use that on me!  I *do* understand the\r
+code that comment is originally from.  ]:--8)\r
author	Austin Clements <amdragon@MIT.EDU>
	Sun, 28 Oct 2012 22:52:04 +0000 (18:52 +2000)
committer	W. Trevor King <wking@tremily.us>
	Fri, 7 Nov 2014 17:50:09 +0000 (09:50 -0800)