Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 8413F431FBC for ; Sun, 28 Oct 2012 15:52:08 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oFcnPjW5vS0I for ; Sun, 28 Oct 2012 15:52:07 -0700 (PDT) Received: from dmz-mailsec-scanner-6.mit.edu (DMZ-MAILSEC-SCANNER-6.MIT.EDU [18.7.68.35]) by olra.theworths.org (Postfix) with ESMTP id 5D186431FAF for ; Sun, 28 Oct 2012 15:52:07 -0700 (PDT) X-AuditID: 12074423-b7fab6d0000008f9-58-508db716ff39 Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) by dmz-mailsec-scanner-6.mit.edu (Symantec Messaging Gateway) with SMTP id BD.39.02297.617BD805; Sun, 28 Oct 2012 18:52:06 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id q9SMq6j5020192; Sun, 28 Oct 2012 18:52:06 -0400 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id q9SMq4nR026482 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Sun, 28 Oct 2012 18:52:05 -0400 (EDT) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77) (envelope-from ) id 1TSbi8-0002Wc-7x; Sun, 28 Oct 2012 18:52:04 -0400 Date: Sun, 28 Oct 2012 18:52:04 -0400 From: Austin Clements To: Jani Nikula Subject: Re: [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch Message-ID: <20121028225204.GD15377@mit.edu> References: <20121022081444.GM14861@mit.edu> <87lieqkz4t.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87lieqkz4t.fsf@nikula.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpileLIzCtJLcpLzFFi42IR4hTV1hXb3htgsP2vuUXTdGeL6zdnMjsw edy6/5rd49mqW8wBTFFcNimpOZllqUX6dglcGfOuvWAv+BdYsecOewPjNNsuRk4OCQETicXf bjNC2GISF+6tZ+ti5OIQEtjHKNH7/hwLhLOBUaJ/yTN2COckk8S7/49ZQVqEBJYwSsx8VgFi swioSixd9RIsziagIbFt/3KwsSICihKbT+4Hs5kFpCW+/W5mArGFBYIkfr2/ywJi8wroSPTM W8oEsWA/o8T0qVfZIRKCEidnPmGBaNaSuPHvJVARB9ig5f84QMKcQLu2P3/JDGKLCqhITDm5 jW0Co9AsJN2zkHTPQuhewMi8ilE2JbdKNzcxM6c4NVm3ODkxLy+1SNdMLzezRC81pXQTIyio 2V2UdzD+Oah0iFGAg1GJh/dCQU+AEGtiWXFl7iFGSQ4mJVHeV2t7A4T4kvJTKjMSizPii0pz UosPMUpwMCuJ8C7lBsrxpiRWVqUW5cOkpDlYlMR5r6Xc9BcSSE8sSc1OTS1ILYLJynBwKEnw WmwDahQsSk1PrUjLzClBSDNxcIIM5wEa/mwLyPDigsTc4sx0iPwpRkUpcd53W4ESAiCJjNI8 uF5Y0nnFKA70ijCvJsgKHmDCgut+BTSYCWiwDh/Y4JJEhJRUA2NJ7u2trqsEOc/PrFcxUTz1 f1P3leOrrxVNEw13d4x8tLtA1XDLChOdsD1X5N6fnPBVtO7FL52rBRkLzi59smZC5T7WY9sf arDstVsf/61/6rk/hruXx5zykjBwt19uxy6f5nPxuv5srUWl7K1Tpngo3p2br12ffsXh2b2O 77aPFHcrrUkMeVKlxFKckWioxVxUnAgAEEZ4sxUDAAA= Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Oct 2012 22:52:08 -0000 Quoth Jani Nikula on Oct 29 at 12:30 am: > On Mon, 22 Oct 2012, Austin Clements wrote: > >> +/* > >> + * Accepted keywords. > >> + * > >> + * A keyword may optionally contain a '|' to indicate the minimum > >> + * match length. Without one, full match is required. It's advisable > >> + * to keep the minimum match parts unique across all keywords. > >> + * > >> + * If keyword begins with upper case letter, then the matching will be > >> + * case sensitive. Otherwise the matching is case insensitive. > >> + * > >> + * If setter is NULL, set_default will be used. > >> + * > >> + * Note: Order matters. Matching is greedy, longest match is used, but > >> + * of equal length matches the first one is used, unless there's an > >> + * equal length case sensitive match which trumps case insensitive > >> + * matches. > > > > If you do have a tokenizer (or disallow mashing keywords together), > > then all of complexity arising from longest match goes away because > > the keyword token either will or won't match a keyword. If you also > > eliminate the rule for case sensitivity and put case-sensitive things > > before conflicting case-insensitive things (so put "M" before > > "m|inutes"), then you can simply use the first match. > > At least one reason for going through the whole table is that if this > ever gets i18n support, the conflicting things might be different. While > order matters in principle, you should create the table so that it > really doesn't matter. While that's true, if the input keyword has to be syntactically delimited, there's still no such thing as a "longest match", since the length of any match will be the length of the input. You may still want to scan the whole table, but if you find multiple matches, it's a bug in the table indicating that |ed prefixes aren't unique. Hence, if you're not interested in finding bugs in the table, you can just find the first match. Or you could remove the |'s from the table, scan the whole table, and consider the input string ambiguous if it matches multiple table entries (being careful with case sensitivity), just like you do now if the input string is shorter than the |ed prefixes. That would simplify your table, your matching logic, and possibly your scanning logic. > > > >> + */ > >> +static struct keyword keywords[] = { > >> + /* Weekdays. */ > >> + { N_("sun|day"), TM_ABS_WDAY, 0, NULL }, > >> + { N_("mon|day"), TM_ABS_WDAY, 1, NULL }, > >> + { N_("tue|sday"), TM_ABS_WDAY, 2, NULL }, > >> + { N_("wed|nesday"), TM_ABS_WDAY, 3, NULL }, > >> + { N_("thu|rsday"), TM_ABS_WDAY, 4, NULL }, > >> + { N_("fri|day"), TM_ABS_WDAY, 5, NULL }, > >> + { N_("sat|urday"), TM_ABS_WDAY, 6, NULL }, > >> + > >> + /* Months. */ > >> + { N_("jan|uary"), TM_ABS_MON, 1, kw_set_month }, > >> + { N_("feb|ruary"), TM_ABS_MON, 2, kw_set_month }, > >> + { N_("mar|ch"), TM_ABS_MON, 3, kw_set_month }, > >> + { N_("apr|il"), TM_ABS_MON, 4, kw_set_month }, > >> + { N_("may"), TM_ABS_MON, 5, kw_set_month }, > >> + { N_("jun|e"), TM_ABS_MON, 6, kw_set_month }, > >> + { N_("jul|y"), TM_ABS_MON, 7, kw_set_month }, > >> + { N_("aug|ust"), TM_ABS_MON, 8, kw_set_month }, > >> + { N_("sep|tember"), TM_ABS_MON, 9, kw_set_month }, > >> + { N_("oct|ober"), TM_ABS_MON, 10, kw_set_month }, > >> + { N_("nov|ember"), TM_ABS_MON, 11, kw_set_month }, > >> + { N_("dec|ember"), TM_ABS_MON, 12, kw_set_month }, > >> + > >> + /* Durations. */ > >> + { N_("y|ears"), TM_REL_YEAR, 1, kw_set_rel }, > >> + { N_("w|eeks"), TM_REL_WEEK, 1, kw_set_rel }, > >> + { N_("d|ays"), TM_REL_DAY, 1, kw_set_rel }, > >> + { N_("h|ours"), TM_REL_HOUR, 1, kw_set_rel }, > >> + { N_("hr|s"), TM_REL_HOUR, 1, kw_set_rel }, > >> + { N_("m|inutes"), TM_REL_MIN, 1, kw_set_rel }, > >> + /* M=months, m=minutes */ > >> + { N_("M"), TM_REL_MON, 1, kw_set_rel }, > >> + { N_("mins"), TM_REL_MIN, 1, kw_set_rel }, > >> + { N_("mo|nths"), TM_REL_MON, 1, kw_set_rel }, > >> + { N_("s|econds"), TM_REL_SEC, 1, kw_set_rel }, > >> + { N_("secs"), TM_REL_SEC, 1, kw_set_rel }, > >> + > >> + /* Numbers. */ > >> + { N_("one"), TM_NONE, 1, kw_set_number }, > >> + { N_("two"), TM_NONE, 2, kw_set_number }, > >> + { N_("three"), TM_NONE, 3, kw_set_number }, > >> + { N_("four"), TM_NONE, 4, kw_set_number }, > >> + { N_("five"), TM_NONE, 5, kw_set_number }, > >> + { N_("six"), TM_NONE, 6, kw_set_number }, > >> + { N_("seven"), TM_NONE, 7, kw_set_number }, > >> + { N_("eight"), TM_NONE, 8, kw_set_number }, > >> + { N_("nine"), TM_NONE, 9, kw_set_number }, > >> + { N_("ten"), TM_NONE, 10, kw_set_number }, > >> + { N_("dozen"), TM_NONE, 12, kw_set_number }, > >> + { N_("hundred"), TM_NONE, 100, kw_set_number }, > >> + > >> + /* Special number forms. */ > >> + { N_("this"), TM_NONE, 0, kw_set_number }, > >> + { N_("last"), TM_NONE, 1, kw_set_number }, > >> + > >> + /* Other special keywords. */ > >> + { N_("yesterday"), TM_REL_DAY, 1, kw_set_rel }, > >> + { N_("today"), TM_NONE, 0, kw_set_today }, > >> + { N_("now"), TM_NONE, 0, kw_set_now }, > >> + { N_("noon"), TM_NONE, 12, kw_set_timeofday }, > >> + { N_("midnight"), TM_NONE, 0, kw_set_timeofday }, > >> + { N_("am"), TM_AMPM, 0, kw_set_ampm }, > >> + { N_("a.m."), TM_AMPM, 0, kw_set_ampm }, > >> + { N_("pm"), TM_AMPM, 1, kw_set_ampm }, > >> + { N_("p.m."), TM_AMPM, 1, kw_set_ampm }, > >> + { N_("st"), TM_NONE, 0, kw_set_ordinal }, > >> + { N_("nd"), TM_NONE, 0, kw_set_ordinal }, > >> + { N_("rd"), TM_NONE, 0, kw_set_ordinal }, > >> + { N_("th"), TM_NONE, 0, kw_set_ordinal }, > >> + > >> + /* Timezone codes: offset in minutes. XXX: Add more codes. */ > >> + { N_("pst"), TM_TZ, -8*60, NULL }, > >> + { N_("mst"), TM_TZ, -7*60, NULL }, > >> + { N_("cst"), TM_TZ, -6*60, NULL }, > >> + { N_("est"), TM_TZ, -5*60, NULL }, > >> + { N_("ast"), TM_TZ, -4*60, NULL }, > >> + { N_("nst"), TM_TZ, -(3*60+30), NULL }, > >> + > >> + { N_("gmt"), TM_TZ, 0, NULL }, > >> + { N_("utc"), TM_TZ, 0, NULL }, > >> + > >> + { N_("wet"), TM_TZ, 0, NULL }, > >> + { N_("cet"), TM_TZ, 1*60, NULL }, > >> + { N_("eet"), TM_TZ, 2*60, NULL }, > >> + { N_("fet"), TM_TZ, 3*60, NULL }, > >> + > >> + { N_("wat"), TM_TZ, 1*60, NULL }, > >> + { N_("cat"), TM_TZ, 2*60, NULL }, > >> + { N_("eat"), TM_TZ, 3*60, NULL }, > >> +}; > >> + > >> +/* > >> + * Compare strings s and keyword. Return number of matching chars on > >> + * match, 0 for no match. Match must be at least n chars, or all of > >> + * keyword if n < 0, otherwise it's not a match. Use match_case for > >> + * case sensitive matching. > >> + */ > >> +static size_t > >> +match_keyword (const char *s, const char *keyword, ssize_t n, bool match_case) > >> +{ > >> + ssize_t i; > >> + > >> + if (!n) > >> + return 0; > >> + > >> + for (i = 0; *s && *keyword; i++, s++, keyword++) { > >> + if (match_case) { > >> + if (*s != *keyword) > > > > The pointer arithmetic doesn't seem to buy anything here. What about > > just looping over i and using s[i] and keyword[i]? > > The pointer arithmetic will be useful when I implement your other > suggestion of handling '|' here. ;) Otherwise, I'd need two index > variables. Fair enough. > > > >> + break; > >> + } else { > >> + if (tolower ((unsigned char) *s) != > >> + tolower ((unsigned char) *keyword)) > > > > I don't think the cast to unsigned char is necessary. > > As discussed on IRC, pedantically it is necessary, as ctype.h functions > accept an int that must have the value of an unsigned char or EOF, and > char might be signed. It wouldn't be C without the pedantic. > >> +/* Combine absolute and relative fields, and round. */ > >> +static int > >> +create_output (struct state *state, time_t *t_out, const time_t *ref, > >> + int round) > >> +{ > >> + struct tm tm = { .tm_isdst = -1 }; > >> + struct tm now; > >> + time_t t; > >> + enum field f; > >> + int r; > >> + int week_round = PARSE_TIME_NO_ROUND; > >> + > >> + r = initialize_now (state, &now, ref); > >> + if (r) > >> + return r; > >> + > >> + /* Initialize fields flagged as "now" to reference time. */ > >> + for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) { > >> + if (state->set[f] == FIELD_NOW) { > >> + state->tm[f] = tm_get_field (&now, f); > >> + state->set[f] = FIELD_SET; > >> + } > >> + } > >> + > >> + /* > >> + * If WDAY is set but MDAY is not, we consider WDAY relative > >> + * > >> + * XXX: This fails on stuff like "two months monday" because two > >> + * months ago wasn't the same day as today. Postpone until we know > >> + * date? > >> + */ > >> + if (is_field_set (state, TM_ABS_WDAY) && > >> + !is_field_set (state, TM_ABS_MDAY)) { > >> + int wday = get_field (state, TM_ABS_WDAY); > >> + int today = tm_get_field (&now, TM_ABS_WDAY); > >> + int rel_days; > >> + > >> + if (today > wday) > >> + rel_days = today - wday; > >> + else > >> + rel_days = today + 7 - wday; > >> + > >> + /* This also prevents special week rounding from happening. */ > >> + mod_field (state, TM_REL_DAY, rel_days); > >> + > >> + unset_field (state, TM_ABS_WDAY); > >> + } > >> + > >> + r = fixup_ampm (state); > >> + if (r) > >> + return r; > >> + > >> + /* > >> + * Iterate fields from most accurate to least accurate, and set > >> + * unset fields according to requested rounding. > >> + */ > >> + for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) { > >> + if (round != PARSE_TIME_NO_ROUND) { > >> + enum field r = abs_to_rel_field (f); > >> + > >> + if (is_field_set (state, f) || is_field_set (state, r)) { > >> + if (round >= PARSE_TIME_ROUND_UP && f != TM_ABS_SEC) { > >> + mod_field (state, r, -1); > > > > Crazy. This could use a comment. It took me a while to figure out > > why this was -1, though maybe that's just because it's late. > > Will do. > > /* You're not expected to understand this */ ;) Hah. You're not allowed to use that on me! I *do* understand the code that comment is originally from. ]:--8)