7c/29d06d648b58ff484af3d6f4e9980ee4a60397

   1 Return-Path: <jani@nikula.org>\r
   2 X-Original-To: notmuch@notmuchmail.org\r
   3 Delivered-To: notmuch@notmuchmail.org\r
   4 Received: from localhost (localhost [127.0.0.1])\r
   5         by olra.theworths.org (Postfix) with ESMTP id E29A9431FB6\r
   6         for <notmuch@notmuchmail.org>; Sun,  5 Aug 2012 14:43:24 -0700 (PDT)\r
   7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
   8 X-Spam-Flag: NO\r
   9 X-Spam-Score: -0.7\r
  10 X-Spam-Level: \r
  11 X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5\r
  12         tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled\r
  13 Received: from olra.theworths.org ([127.0.0.1])\r
  14         by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
  15         with ESMTP id sLCmi06MadNQ for <notmuch@notmuchmail.org>;\r
  16         Sun,  5 Aug 2012 14:43:23 -0700 (PDT)\r
  17 Received: from mail-lb0-f181.google.com (mail-lb0-f181.google.com\r
  18         [209.85.217.181]) (using TLSv1 with cipher RC4-SHA (128/128 bits))\r
  19         (No client certificate requested)\r
  20         by olra.theworths.org (Postfix) with ESMTPS id 77BF0431FAE\r
  21         for <notmuch@notmuchmail.org>; Sun,  5 Aug 2012 14:43:23 -0700 (PDT)\r
  22 Received: by lbbgk1 with SMTP id gk1so1357467lbb.26\r
  23         for <notmuch@notmuchmail.org>; Sun, 05 Aug 2012 14:43:22 -0700 (PDT)\r
  24 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\r
  25         d=google.com; s=20120113;\r
  26         h=from:to:subject:in-reply-to:references:user-agent:date:message-id\r
  27         :mime-version:content-type:x-gm-message-state;\r
  28         bh=r6fHY8U3fg5bLh1fkjXmuQpkIxJT9zgLr/x9epW9zkY=;\r
  29         b=IvJclUNJkS4j/OUM8lcBQim4U8lHWboA/LFz3XYwuS2KvKIz5/3HlvbgYnhYg+c6aY\r
  30         a26JC7AY1RFWXnL9HgIpHqh99tK+yAM0UzUkEo8g79CHhWAkUqr/jp7XdLQBtwf5TpEP\r
  31         /4VhQ+TsnwHxbmguyo4EwhwyXfOxdUlbNm6YW/5ZMou1Emaa771yxvu++7x2AKzt8tgy\r
  32         zY2b/Ut0k5grBXPUx6fXPiLZtWScr32+Ylsvr9q5D8QaVRM02TDmh29PRRYXEtXEY1gK\r
  33         OHtdS466mYaxygDJZHtrBpjRev2elzU+VBZNy+/4SA6l1ghl8KzWv5KU2TqQOeVORJMr\r
  34         C05A==\r
  35 Received: by 10.112.44.163 with SMTP id f3mr3666046lbm.59.1344203001916;\r
  36         Sun, 05 Aug 2012 14:43:21 -0700 (PDT)\r
  37 Received: from localhost (dsl-hkibrasgw4-fe51df00-27.dhcp.inet.fi.\r
  38         [80.223.81.27])\r
  39         by mx.google.com with ESMTPS id o5sm3345879lbg.5.2012.08.05.14.43.19\r
  40         (version=SSLv3 cipher=OTHER); Sun, 05 Aug 2012 14:43:20 -0700 (PDT)\r
  41 From: Jani Nikula <jani@nikula.org>\r
  42 To: David Bremner <david@tethera.net>, notmuch@notmuchmail.org\r
  43 Subject: Re: [PATCH v2 2/7] lib: add a date/time parser module\r
  44 In-Reply-To: <877gtdmqol.fsf@zancas.localnet>\r
  45 References: <cover.1344065790.git.jani@nikula.org>\r
  46         <133d16fa9b63e4cd91aa2b8816a6ad7285b3bd4c.1344065790.git.jani@nikula.org>\r
  47         <877gtdmqol.fsf@zancas.localnet>\r
  48 User-Agent: Notmuch/0.13.2+125~ga3f01dd (http://notmuchmail.org) Emacs/23.3.1\r
  49         (i686-pc-linux-gnu)\r
  50 Date: Mon, 06 Aug 2012 00:43:18 +0300\r
  51 Message-ID: <87628xnhft.fsf@nikula.org>\r
  52 MIME-Version: 1.0\r
  53 Content-Type: text/plain; charset=us-ascii\r
  54 X-Gm-Message-State:\r
  55  ALoCoQnUnXPNlAh2patp9PRqcSgkCYXWz9gG03/tZbRRxxqaaDYFO2Pvzdy6IGNh++82ObcTeW8Z\r
  56 X-BeenThere: notmuch@notmuchmail.org\r
  57 X-Mailman-Version: 2.1.13\r
  58 Precedence: list\r
  59 List-Id: "Use and development of the notmuch mail system."\r
  60         <notmuch.notmuchmail.org>\r
  61 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
  62         <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
  63 List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
  64 List-Post: <mailto:notmuch@notmuchmail.org>\r
  65 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
  66 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
  67         <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
  68 X-List-Received-Date: Sun, 05 Aug 2012 21:43:25 -0000\r
  69 \r
  70 \r
  71 Hi David, thanks for the review!\r
  72 \r
  73 On Sun, 05 Aug 2012, David Bremner <david@tethera.net> wrote:\r
  74 > Jani Nikula <jani@nikula.org> writes:\r
  75 >\r
  76 >> +\r
  77 >> +static enum field\r
  78 >> +abs_to_rel_field (enum field field)\r
  79 >> +{\r
  80 >> +    assert (field <= TM_ABS_YEAR);\r
  81 >> +\r
  82 >> +    /* note: depends on the enum ordering */\r
  83 >> +    return field + (TM_REL_SEC - TM_ABS_SEC);\r
  84 >> +}\r
  85 >> +\r
  86 >\r
  87 > I wonder if this would be slightly nicer of you defined a TM_FIRST_REL\r
  88 > or so as a synonym like TM_NONE and TM_SIZE\r
  89 \r
  90 Good idea.\r
  91 \r
  92 >> +/* get zero value for field */\r
  93 >> +static int\r
  94 >> +field_zero (enum field field)\r
  95 >> +{\r
  96 >> +    if (field == TM_ABS_MDAY || field == TM_ABS_MON)\r
  97 >> +    return 1;\r
  98 >> +    else if (field == TM_ABS_YEAR)\r
  99 >> +    return 1970;\r
 100 >> +    else\r
 101 >> +    return 0;\r
 102 >> +}\r
 103 >\r
 104 > what do you think about using the word "epoch" instead of zero here?\r
 105 \r
 106 As a non-native speaker, I'll just take your word for it if you think it\r
 107 would be better. :)\r
 108 \r
 109 >> +static bool\r
 110 >> +get_postponed_number (struct state *state, int *v, int *n, char *d)\r
 111 >> +{\r
 112 >\r
 113 > I found the 1 letter names not quite obvious here.\r
 114 \r
 115 True. I think v and n are used fairly consistently throughout the source\r
 116 file, so I'll have to consider longer names vs. documentation comment\r
 117 for them.\r
 118 \r
 119 > At this point reading the code, I have not trouble understanding each\r
 120 > line/function, but I feel like I'm missing the big picture a bit. \r
 121 \r
 122 Yeah, perhaps the code reads better if you follow from the top level\r
 123 parse_time_string() down. Which is at the very end. A "big picture"\r
 124 documentation comment would do no harm.\r
 125 \r
 126 > What is a postponed number?\r
 127 \r
 128 I see that you grasped that later, but I'll describe anyway. Parsing is\r
 129 done from left to right, in a greedy fashion (i.e. match the longest\r
 130 possible known expression). If after that we encounter a number that we\r
 131 don't know what to do with yet, postpone it until we move on to the next\r
 132 expression. The parser for that might eat the preceding "postponed"\r
 133 number (for example "5 August", 5 would be postponed because we don't\r
 134 know what to do with yet, but then "August" would be the context to make\r
 135 it day of month). If that is not the case, the number will be parsed by\r
 136 parse_postponed_number() as a lonely, single number between there. (Yes,\r
 137 this should be documented better with the "big picture".)\r
 138 \r
 139 >\r
 140 >> +    /*\r
 141 >> +     * REVISIT: There could be a "next_field" that would be set from\r
 142 >> +     * "field" for the duration of the handle_postponed_number() call,\r
 143 >> +     * so it has more information to work with.\r
 144 >> +     */\r
 145 >\r
 146 > The notmuch convention seems to be to use XXX: for this. I'm not sure\r
 147 > I'd bother changing, especially if we can't decide how to package this.\r
 148 \r
 149 Okay, can be changed if needed.\r
 150 \r
 151 >\r
 152 >> +/* Time set helper. No input checking. Use UNSET (-1) to leave unset. */\r
 153 >> +static int\r
 154 >> +set_abs_time (struct state *state, int hour, int min, int sec)\r
 155 >> +{\r
 156 >> +    int r;\r
 157 >> +\r
 158 >> +    if (hour != UNSET) {\r
 159 >> +    if ((r = set_field (state, TM_ABS_HOUR, hour)))\r
 160 >> +        return r;\r
 161 >> +    }\r
 162 >\r
 163 > So for this function and the next, the first match wins? I don't really\r
 164 > see the motivation for this, maybe you can explain a bit.\r
 165 \r
 166 The whole parser tries to be as unambiguous as possible. If the input\r
 167 leads to a situation in which any absolute time field (see enum field)\r
 168 is attempted to set twice, it means it appears twice in the input. For\r
 169 example, "2012-08-06 August", where the month is set twice. By design,\r
 170 that is not allowed, and set_field() fails, even if it's the same\r
 171 value. The only semi-exception is having redundant weekday there, for\r
 172 example "Monday, August 6".\r
 173 \r
 174 >\r
 175 >\r
 176 >> +    /* timezone codes: offset in minutes. FIXME: add more codes. */\r
 177 >\r
 178 > Did you think about trying to delegate the list of timezones to the\r
 179 > system?\r
 180 \r
 181 No. :) I'll look into it.\r
 182 \r
 183 >\r
 184 >> + * Compare strings s and keyword. Return number of matching chars on\r
 185 >> + * match, 0 for no match. Match must be at least n chars (n == 0 all\r
 186 >> + * of keyword), otherwise it's not a match. Use match_case for case\r
 187 >> + * sensitive matching.\r
 188 >> + */\r
 189 >\r
 190 > I guess that's fine, and it is internal, but maybe -1 for whole string\r
 191 > would be slightly nicer (although I can't imagine what good matching 0\r
 192 > length strings is at the moment).\r
 193 \r
 194 Can be changed.\r
 195 \r
 196 >\r
 197 >> +    /* Minimum match length. */\r
 198 >> +    p = strchr (keyword, '|');\r
 199 >> +    if (p) {\r
 200 >> +        minlen = p - keyword;\r
 201 >> +        memmove (p, p + 1, strlen (p + 1) + 1);\r
 202 >> +    }\r
 203 >\r
 204 > Something about that memmove creeps me out, but I trust you that it's\r
 205 > correct. Alternatively I guess you could represent keywords as pairs of\r
 206 > strings, which is probably more of a pain.\r
 207 \r
 208 I didn't bother to double check it now, but I remember thinking it over\r
 209 very carefully. :) I agree it could use more clarity to be more\r
 210 obviously correct.\r
 211 \r
 212 (Initially the minlen was coded as an int in the table, but the above\r
 213 allows the localization to decide how long the match must be.)\r
 214 \r
 215 >\r
 216 >\r
 217 >> +\r
 218 >> +/* Parse a single number. Typically postpone parsing until later. */\r
 219 >\r
 220 > OK, so I finally start to understand what a postponed number is :)\r
 221 > I understand the compiler likes bottom up declarations, but some\r
 222 > top down declarations/comments are needed I think.\r
 223 \r
 224 I agree more comments would be in order.\r
 225 \r
 226 >\r
 227 >> +static int\r
 228 >> +parse_date (struct state *state, char sep,\r
 229 >> +        unsigned long v1, unsigned long v2, unsigned long v3,\r
 230 >> +        size_t n1, size_t n2, size_t n3)\r
 231 >> +{\r
 232 >> +    int year = UNSET, mon = UNSET, mday = UNSET;\r
 233 >> +\r
 234 >> +    assert (is_date_sep (sep));\r
 235 >> +\r
 236 >> +    switch (sep) {\r
 237 >> +    case '/': /* Date: M[M]/D[D][/YY[YY]] or M[M]/YYYY */\r
 238 >\r
 239 > If I understand correctly, this chooses between American (?) month, day,\r
 240 > year ordering and "sensible" day, month, year ordering by delimiter. I\r
 241 > never thought about this as a way to tell (I often write D/M/Y), but\r
 242 > that could be just me. I agree it's fine as a convention.\r
 243 \r
 244 You understand correctly. It's obviously not a reliable way to tell\r
 245 unless you make it the convention, which I chose to do. Some parsers,\r
 246 notably the one in git, also look at the values to see if it could be\r
 247 D/M/Y if M/D/Y is not possible, but I don't like the ambiguity that\r
 248 introduces.\r
 249 \r
 250 >\r
 251 >> +/*\r
 252 >> + * Parse delimiter(s). Return < 0 on error, number of parsed chars on\r
 253 >> + * success.\r
 254 >> + */\r
 255 >\r
 256 > So 1:-2 will parse as 1-2 ?, i.e. last delimiter wins? Maybe better to\r
 257 > say so explicitly.\r
 258 \r
 259 Agreed. It just throws out any extra delimiters. Perhaps this should\r
 260 also be more strict about the allowed delimiters (now anything is\r
 261 allowed).\r
 262 \r
 263 >\r
 264 >> +/* Combine absolute and relative fields, and round. */\r
 265 >> +static int\r
 266 >> +create_output (struct state *state, time_t *t_out, const time_t *tnow,\r
 267 >> +           int round)\r
 268 >> +{\r
 269 >\r
 270 > It seems like most of non-obvious logic like (when is "wednesday") is\r
 271 > encoded here. From a maintenence point of view, it would be nice to be\r
 272 > able to seperate out the heuristic stuff from the mechanical, to the\r
 273 > degree that it is possible.\r
 274 \r
 275 Agreed. Basically this is step 2, turning all information parsed in step\r
 276 1 (function parse_input()) into a sensible result. Figuring out the\r
 277 current time is also done here.\r
 278 \r
 279 \r
 280 BR,\r
 281 Jani.\r