63/b255f78c075e1d24211eb58e93906e99709a1a

   1 Return-Path: <amdragon@mit.edu>\r
   2 X-Original-To: notmuch@notmuchmail.org\r
   3 Delivered-To: notmuch@notmuchmail.org\r
   4 Received: from localhost (localhost [127.0.0.1])\r
   5         by olra.theworths.org (Postfix) with ESMTP id 8413F431FBC\r
   6         for <notmuch@notmuchmail.org>; Sun, 28 Oct 2012 15:52:08 -0700 (PDT)\r
   7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
   8 X-Spam-Flag: NO\r
   9 X-Spam-Score: -0.7\r
  10 X-Spam-Level: \r
  11 X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5\r
  12         tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled\r
  13 Received: from olra.theworths.org ([127.0.0.1])\r
  14         by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
  15         with ESMTP id oFcnPjW5vS0I for <notmuch@notmuchmail.org>;\r
  16         Sun, 28 Oct 2012 15:52:07 -0700 (PDT)\r
  17 Received: from dmz-mailsec-scanner-6.mit.edu (DMZ-MAILSEC-SCANNER-6.MIT.EDU\r
  18         [18.7.68.35])\r
  19         by olra.theworths.org (Postfix) with ESMTP id 5D186431FAF\r
  20         for <notmuch@notmuchmail.org>; Sun, 28 Oct 2012 15:52:07 -0700 (PDT)\r
  21 X-AuditID: 12074423-b7fab6d0000008f9-58-508db716ff39\r
  22 Received: from mailhub-auth-3.mit.edu ( [18.9.21.43])\r
  23         by dmz-mailsec-scanner-6.mit.edu (Symantec Messaging Gateway) with SMTP\r
  24         id BD.39.02297.617BD805; Sun, 28 Oct 2012 18:52:06 -0400 (EDT)\r
  25 Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103])\r
  26         by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id q9SMq6j5020192; \r
  27         Sun, 28 Oct 2012 18:52:06 -0400\r
  28 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91])\r
  29         (authenticated bits=0)\r
  30         (User authenticated as amdragon@ATHENA.MIT.EDU)\r
  31         by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id q9SMq4nR026482\r
  32         (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT);\r
  33         Sun, 28 Oct 2012 18:52:05 -0400 (EDT)\r
  34 Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77)\r
  35         (envelope-from <amdragon@mit.edu>)\r
  36         id 1TSbi8-0002Wc-7x; Sun, 28 Oct 2012 18:52:04 -0400\r
  37 Date: Sun, 28 Oct 2012 18:52:04 -0400\r
  38 From: Austin Clements <amdragon@MIT.EDU>\r
  39 To: Jani Nikula <jani@nikula.org>\r
  40 Subject: Re: [PATCH v5 2/9] parse-time-string: add a date/time parser to\r
  41         notmuch\r
  42 Message-ID: <20121028225204.GD15377@mit.edu>\r
  43 References: <cover.1350854171.git.jani@nikula.org>\r
  44         <a90d3b687895a26f765539d6c0420038a74ee42f.1350854171.git.jani@nikula.org>\r
  45         <20121022081444.GM14861@mit.edu> <87lieqkz4t.fsf@nikula.org>\r
  46 MIME-Version: 1.0\r
  47 Content-Type: text/plain; charset=us-ascii\r
  48 Content-Disposition: inline\r
  49 In-Reply-To: <87lieqkz4t.fsf@nikula.org>\r
  50 User-Agent: Mutt/1.5.21 (2010-09-15)\r
  51 X-Brightmail-Tracker:\r
  52  H4sIAAAAAAAAA+NgFmpileLIzCtJLcpLzFFi42IR4hTV1hXb3htgsP2vuUXTdGeL6zdnMjsw\r
  53         edy6/5rd49mqW8wBTFFcNimpOZllqUX6dglcGfOuvWAv+BdYsecOewPjNNsuRk4OCQETicXf\r
  54         bjNC2GISF+6tZ+ti5OIQEtjHKNH7/hwLhLOBUaJ/yTN2COckk8S7/49ZQVqEBJYwSsx8VgFi\r
  55         swioSixd9RIsziagIbFt/3KwsSICihKbT+4Hs5kFpCW+/W5mArGFBYIkfr2/ywJi8wroSPTM\r
  56         W8oEsWA/o8T0qVfZIRKCEidnPmGBaNaSuPHvJVARB9ig5f84QMKcQLu2P3/JDGKLCqhITDm5\r
  57         jW0Co9AsJN2zkHTPQuhewMi8ilE2JbdKNzcxM6c4NVm3ODkxLy+1SNdMLzezRC81pXQTIyio\r
  58         2V2UdzD+Oah0iFGAg1GJh/dCQU+AEGtiWXFl7iFGSQ4mJVHeV2t7A4T4kvJTKjMSizPii0pz\r
  59         UosPMUpwMCuJ8C7lBsrxpiRWVqUW5cOkpDlYlMR5r6Xc9BcSSE8sSc1OTS1ILYLJynBwKEnw\r
  60         WmwDahQsSk1PrUjLzClBSDNxcIIM5wEa/mwLyPDigsTc4sx0iPwpRkUpcd53W4ESAiCJjNI8\r
  61         uF5Y0nnFKA70ijCvJsgKHmDCgut+BTSYCWiwDh/Y4JJEhJRUA2NJ7u2trqsEOc/PrFcxUTz1\r
  62         f1P3leOrrxVNEw13d4x8tLtA1XDLChOdsD1X5N6fnPBVtO7FL52rBRkLzi59smZC5T7WY9sf\r
  63         arDstVsf/61/6rk/hruXx5zykjBwt19uxy6f5nPxuv5srUWl7K1Tpngo3p2br12ffsXh2b2O\r
  64         77aPFHcrrUkMeVKlxFKckWioxVxUnAgAEEZ4sxUDAAA=\r
  65 Cc: notmuch@notmuchmail.org\r
  66 X-BeenThere: notmuch@notmuchmail.org\r
  67 X-Mailman-Version: 2.1.13\r
  68 Precedence: list\r
  69 List-Id: "Use and development of the notmuch mail system."\r
  70         <notmuch.notmuchmail.org>\r
  71 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
  72         <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
  73 List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
  74 List-Post: <mailto:notmuch@notmuchmail.org>\r
  75 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
  76 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
  77         <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
  78 X-List-Received-Date: Sun, 28 Oct 2012 22:52:08 -0000\r
  79 \r
  80 Quoth Jani Nikula on Oct 29 at 12:30 am:\r
  81 > On Mon, 22 Oct 2012, Austin Clements <amdragon@MIT.EDU> wrote:\r
  82 > >> +/*\r
  83 > >> + * Accepted keywords.\r
  84 > >> + *\r
  85 > >> + * A keyword may optionally contain a '|' to indicate the minimum\r
  86 > >> + * match length. Without one, full match is required. It's advisable\r
  87 > >> + * to keep the minimum match parts unique across all keywords.\r
  88 > >> + *\r
  89 > >> + * If keyword begins with upper case letter, then the matching will be\r
  90 > >> + * case sensitive. Otherwise the matching is case insensitive.\r
  91 > >> + *\r
  92 > >> + * If setter is NULL, set_default will be used.\r
  93 > >> + *\r
  94 > >> + * Note: Order matters. Matching is greedy, longest match is used, but\r
  95 > >> + * of equal length matches the first one is used, unless there's an\r
  96 > >> + * equal length case sensitive match which trumps case insensitive\r
  97 > >> + * matches.\r
  98 > >\r
  99 > > If you do have a tokenizer (or disallow mashing keywords together),\r
 100 > > then all of complexity arising from longest match goes away because\r
 101 > > the keyword token either will or won't match a keyword.  If you also\r
 102 > > eliminate the rule for case sensitivity and put case-sensitive things\r
 103 > > before conflicting case-insensitive things (so put "M" before\r
 104 > > "m|inutes"), then you can simply use the first match.\r
 105 > \r
 106 > At least one reason for going through the whole table is that if this\r
 107 > ever gets i18n support, the conflicting things might be different. While\r
 108 > order matters in principle, you should create the table so that it\r
 109 > really doesn't matter.\r
 110 \r
 111 While that's true, if the input keyword has to be syntactically\r
 112 delimited, there's still no such thing as a "longest match", since the\r
 113 length of any match will be the length of the input.  You may still\r
 114 want to scan the whole table, but if you find multiple matches, it's a\r
 115 bug in the table indicating that |ed prefixes aren't unique.  Hence,\r
 116 if you're not interested in finding bugs in the table, you can just\r
 117 find the first match.\r
 118 \r
 119 Or you could remove the |'s from the table, scan the whole table, and\r
 120 consider the input string ambiguous if it matches multiple table\r
 121 entries (being careful with case sensitivity), just like you do now if\r
 122 the input string is shorter than the |ed prefixes.  That would\r
 123 simplify your table, your matching logic, and possibly your scanning\r
 124 logic.\r
 125 \r
 126 > >\r
 127 > >> + */\r
 128 > >> +static struct keyword keywords[] = {\r
 129 > >> +    /* Weekdays. */\r
 130 > >> +    { N_("sun|day"),      TM_ABS_WDAY,    0,      NULL },\r
 131 > >> +    { N_("mon|day"),      TM_ABS_WDAY,    1,      NULL },\r
 132 > >> +    { N_("tue|sday"),     TM_ABS_WDAY,    2,      NULL },\r
 133 > >> +    { N_("wed|nesday"),   TM_ABS_WDAY,    3,      NULL },\r
 134 > >> +    { N_("thu|rsday"),    TM_ABS_WDAY,    4,      NULL },\r
 135 > >> +    { N_("fri|day"),      TM_ABS_WDAY,    5,      NULL },\r
 136 > >> +    { N_("sat|urday"),    TM_ABS_WDAY,    6,      NULL },\r
 137 > >> +\r
 138 > >> +    /* Months. */\r
 139 > >> +    { N_("jan|uary"),     TM_ABS_MON,     1,      kw_set_month },\r
 140 > >> +    { N_("feb|ruary"),    TM_ABS_MON,     2,      kw_set_month },\r
 141 > >> +    { N_("mar|ch"),       TM_ABS_MON,     3,      kw_set_month },\r
 142 > >> +    { N_("apr|il"),       TM_ABS_MON,     4,      kw_set_month },\r
 143 > >> +    { N_("may"),  TM_ABS_MON,     5,      kw_set_month },\r
 144 > >> +    { N_("jun|e"),        TM_ABS_MON,     6,      kw_set_month },\r
 145 > >> +    { N_("jul|y"),        TM_ABS_MON,     7,      kw_set_month },\r
 146 > >> +    { N_("aug|ust"),      TM_ABS_MON,     8,      kw_set_month },\r
 147 > >> +    { N_("sep|tember"),   TM_ABS_MON,     9,      kw_set_month },\r
 148 > >> +    { N_("oct|ober"),     TM_ABS_MON,     10,     kw_set_month },\r
 149 > >> +    { N_("nov|ember"),    TM_ABS_MON,     11,     kw_set_month },\r
 150 > >> +    { N_("dec|ember"),    TM_ABS_MON,     12,     kw_set_month },\r
 151 > >> +\r
 152 > >> +    /* Durations. */\r
 153 > >> +    { N_("y|ears"),       TM_REL_YEAR,    1,      kw_set_rel },\r
 154 > >> +    { N_("w|eeks"),       TM_REL_WEEK,    1,      kw_set_rel },\r
 155 > >> +    { N_("d|ays"),        TM_REL_DAY,     1,      kw_set_rel },\r
 156 > >> +    { N_("h|ours"),       TM_REL_HOUR,    1,      kw_set_rel },\r
 157 > >> +    { N_("hr|s"), TM_REL_HOUR,    1,      kw_set_rel },\r
 158 > >> +    { N_("m|inutes"),     TM_REL_MIN,     1,      kw_set_rel },\r
 159 > >> +    /* M=months, m=minutes */\r
 160 > >> +    { N_("M"),            TM_REL_MON,     1,      kw_set_rel },\r
 161 > >> +    { N_("mins"), TM_REL_MIN,     1,      kw_set_rel },\r
 162 > >> +    { N_("mo|nths"),      TM_REL_MON,     1,      kw_set_rel },\r
 163 > >> +    { N_("s|econds"),     TM_REL_SEC,     1,      kw_set_rel },\r
 164 > >> +    { N_("secs"), TM_REL_SEC,     1,      kw_set_rel },\r
 165 > >> +\r
 166 > >> +    /* Numbers. */\r
 167 > >> +    { N_("one"),  TM_NONE,        1,      kw_set_number },\r
 168 > >> +    { N_("two"),  TM_NONE,        2,      kw_set_number },\r
 169 > >> +    { N_("three"),        TM_NONE,        3,      kw_set_number },\r
 170 > >> +    { N_("four"), TM_NONE,        4,      kw_set_number },\r
 171 > >> +    { N_("five"), TM_NONE,        5,      kw_set_number },\r
 172 > >> +    { N_("six"),  TM_NONE,        6,      kw_set_number },\r
 173 > >> +    { N_("seven"),        TM_NONE,        7,      kw_set_number },\r
 174 > >> +    { N_("eight"),        TM_NONE,        8,      kw_set_number },\r
 175 > >> +    { N_("nine"), TM_NONE,        9,      kw_set_number },\r
 176 > >> +    { N_("ten"),  TM_NONE,        10,     kw_set_number },\r
 177 > >> +    { N_("dozen"),        TM_NONE,        12,     kw_set_number },\r
 178 > >> +    { N_("hundred"),      TM_NONE,        100,    kw_set_number },\r
 179 > >> +\r
 180 > >> +    /* Special number forms. */\r
 181 > >> +    { N_("this"), TM_NONE,        0,      kw_set_number },\r
 182 > >> +    { N_("last"), TM_NONE,        1,      kw_set_number },\r
 183 > >> +\r
 184 > >> +    /* Other special keywords. */\r
 185 > >> +    { N_("yesterday"),    TM_REL_DAY,     1,      kw_set_rel },\r
 186 > >> +    { N_("today"),        TM_NONE,        0,      kw_set_today },\r
 187 > >> +    { N_("now"),  TM_NONE,        0,      kw_set_now },\r
 188 > >> +    { N_("noon"), TM_NONE,        12,     kw_set_timeofday },\r
 189 > >> +    { N_("midnight"),     TM_NONE,        0,      kw_set_timeofday },\r
 190 > >> +    { N_("am"),           TM_AMPM,        0,      kw_set_ampm },\r
 191 > >> +    { N_("a.m."), TM_AMPM,        0,      kw_set_ampm },\r
 192 > >> +    { N_("pm"),           TM_AMPM,        1,      kw_set_ampm },\r
 193 > >> +    { N_("p.m."), TM_AMPM,        1,      kw_set_ampm },\r
 194 > >> +    { N_("st"),           TM_NONE,        0,      kw_set_ordinal },\r
 195 > >> +    { N_("nd"),           TM_NONE,        0,      kw_set_ordinal },\r
 196 > >> +    { N_("rd"),           TM_NONE,        0,      kw_set_ordinal },\r
 197 > >> +    { N_("th"),           TM_NONE,        0,      kw_set_ordinal },\r
 198 > >> +\r
 199 > >> +    /* Timezone codes: offset in minutes. XXX: Add more codes. */\r
 200 > >> +    { N_("pst"),  TM_TZ,          -8*60,  NULL },\r
 201 > >> +    { N_("mst"),  TM_TZ,          -7*60,  NULL },\r
 202 > >> +    { N_("cst"),  TM_TZ,          -6*60,  NULL },\r
 203 > >> +    { N_("est"),  TM_TZ,          -5*60,  NULL },\r
 204 > >> +    { N_("ast"),  TM_TZ,          -4*60,  NULL },\r
 205 > >> +    { N_("nst"),  TM_TZ,          -(3*60+30),     NULL },\r
 206 > >> +\r
 207 > >> +    { N_("gmt"),  TM_TZ,          0,      NULL },\r
 208 > >> +    { N_("utc"),  TM_TZ,          0,      NULL },\r
 209 > >> +\r
 210 > >> +    { N_("wet"),  TM_TZ,          0,      NULL },\r
 211 > >> +    { N_("cet"),  TM_TZ,          1*60,   NULL },\r
 212 > >> +    { N_("eet"),  TM_TZ,          2*60,   NULL },\r
 213 > >> +    { N_("fet"),  TM_TZ,          3*60,   NULL },\r
 214 > >> +\r
 215 > >> +    { N_("wat"),  TM_TZ,          1*60,   NULL },\r
 216 > >> +    { N_("cat"),  TM_TZ,          2*60,   NULL },\r
 217 > >> +    { N_("eat"),  TM_TZ,          3*60,   NULL },\r
 218 > >> +};\r
 219 > >> +\r
 220 > >> +/*\r
 221 > >> + * Compare strings s and keyword. Return number of matching chars on\r
 222 > >> + * match, 0 for no match. Match must be at least n chars, or all of\r
 223 > >> + * keyword if n < 0, otherwise it's not a match. Use match_case for\r
 224 > >> + * case sensitive matching.\r
 225 > >> + */\r
 226 > >> +static size_t\r
 227 > >> +match_keyword (const char *s, const char *keyword, ssize_t n, bool match_case)\r
 228 > >> +{\r
 229 > >> +    ssize_t i;\r
 230 > >> +\r
 231 > >> +    if (!n)\r
 232 > >> +  return 0;\r
 233 > >> +\r
 234 > >> +    for (i = 0; *s && *keyword; i++, s++, keyword++) {\r
 235 > >> +  if (match_case) {\r
 236 > >> +      if (*s != *keyword)\r
 237 > >\r
 238 > > The pointer arithmetic doesn't seem to buy anything here.  What about\r
 239 > > just looping over i and using s[i] and keyword[i]?\r
 240 > \r
 241 > The pointer arithmetic will be useful when I implement your other\r
 242 > suggestion of handling '|' here. ;) Otherwise, I'd need two index\r
 243 > variables.\r
 244 \r
 245 Fair enough.\r
 246 \r
 247 > >\r
 248 > >> +          break;\r
 249 > >> +  } else {\r
 250 > >> +      if (tolower ((unsigned char) *s) !=\r
 251 > >> +          tolower ((unsigned char) *keyword))\r
 252 > >\r
 253 > > I don't think the cast to unsigned char is necessary.\r
 254 > \r
 255 > As discussed on IRC, pedantically it is necessary, as ctype.h functions\r
 256 > accept an int that must have the value of an unsigned char or EOF, and\r
 257 > char might be signed.\r
 258 \r
 259 It wouldn't be C without the pedantic.\r
 260 \r
 261 > >> +/* Combine absolute and relative fields, and round. */\r
 262 > >> +static int\r
 263 > >> +create_output (struct state *state, time_t *t_out, const time_t *ref,\r
 264 > >> +         int round)\r
 265 > >> +{\r
 266 > >> +    struct tm tm = { .tm_isdst = -1 };\r
 267 > >> +    struct tm now;\r
 268 > >> +    time_t t;\r
 269 > >> +    enum field f;\r
 270 > >> +    int r;\r
 271 > >> +    int week_round = PARSE_TIME_NO_ROUND;\r
 272 > >> +\r
 273 > >> +    r = initialize_now (state, &now, ref);\r
 274 > >> +    if (r)\r
 275 > >> +  return r;\r
 276 > >> +\r
 277 > >> +    /* Initialize fields flagged as "now" to reference time. */\r
 278 > >> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {\r
 279 > >> +  if (state->set[f] == FIELD_NOW) {\r
 280 > >> +      state->tm[f] = tm_get_field (&now, f);\r
 281 > >> +      state->set[f] = FIELD_SET;\r
 282 > >> +  }\r
 283 > >> +    }\r
 284 > >> +\r
 285 > >> +    /*\r
 286 > >> +     * If WDAY is set but MDAY is not, we consider WDAY relative\r
 287 > >> +     *\r
 288 > >> +     * XXX: This fails on stuff like "two months monday" because two\r
 289 > >> +     * months ago wasn't the same day as today. Postpone until we know\r
 290 > >> +     * date?\r
 291 > >> +     */\r
 292 > >> +    if (is_field_set (state, TM_ABS_WDAY) &&\r
 293 > >> +  !is_field_set (state, TM_ABS_MDAY)) {\r
 294 > >> +  int wday = get_field (state, TM_ABS_WDAY);\r
 295 > >> +  int today = tm_get_field (&now, TM_ABS_WDAY);\r
 296 > >> +  int rel_days;\r
 297 > >> +\r
 298 > >> +  if (today > wday)\r
 299 > >> +      rel_days = today - wday;\r
 300 > >> +  else\r
 301 > >> +      rel_days = today + 7 - wday;\r
 302 > >> +\r
 303 > >> +  /* This also prevents special week rounding from happening. */\r
 304 > >> +  mod_field (state, TM_REL_DAY, rel_days);\r
 305 > >> +\r
 306 > >> +  unset_field (state, TM_ABS_WDAY);\r
 307 > >> +    }\r
 308 > >> +\r
 309 > >> +    r = fixup_ampm (state);\r
 310 > >> +    if (r)\r
 311 > >> +  return r;\r
 312 > >> +\r
 313 > >> +    /*\r
 314 > >> +     * Iterate fields from most accurate to least accurate, and set\r
 315 > >> +     * unset fields according to requested rounding.\r
 316 > >> +     */\r
 317 > >> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {\r
 318 > >> +  if (round != PARSE_TIME_NO_ROUND) {\r
 319 > >> +      enum field r = abs_to_rel_field (f);\r
 320 > >> +\r
 321 > >> +      if (is_field_set (state, f) || is_field_set (state, r)) {\r
 322 > >> +          if (round >= PARSE_TIME_ROUND_UP && f != TM_ABS_SEC) {\r
 323 > >> +              mod_field (state, r, -1);\r
 324 > >\r
 325 > > Crazy.  This could use a comment.  It took me a while to figure out\r
 326 > > why this was -1, though maybe that's just because it's late.\r
 327 > \r
 328 > Will do.\r
 329 > \r
 330 > /* You're not expected to understand this */ ;)\r
 331 \r
 332 Hah.  You're not allowed to use that on me!  I *do* understand the\r
 333 code that comment is originally from.  ]:--8)\r