--- /dev/null
+Return-Path: <amdragon@mit.edu>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by olra.theworths.org (Postfix) with ESMTP id 1707E431FAF\r
+ for <notmuch@notmuchmail.org>; Mon, 3 Feb 2014 13:31:35 -0800 (PST)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.7\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5\r
+ tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+ by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id 4kbVNpBeILUO for <notmuch@notmuchmail.org>;\r
+ Mon, 3 Feb 2014 13:31:27 -0800 (PST)\r
+Received: from dmz-mailsec-scanner-6.mit.edu (dmz-mailsec-scanner-6.mit.edu\r
+ [18.7.68.35])\r
+ (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))\r
+ (No client certificate requested)\r
+ by olra.theworths.org (Postfix) with ESMTPS id A4041431E84\r
+ for <notmuch@notmuchmail.org>; Mon, 3 Feb 2014 13:31:20 -0800 (PST)\r
+X-AuditID: 12074423-f79726d000000cc9-41-52f00aa66d59\r
+Received: from mailhub-auth-3.mit.edu ( [18.9.21.43])\r
+ (using TLS with cipher AES256-SHA (256/256 bits))\r
+ (Client did not present a certificate)\r
+ by dmz-mailsec-scanner-6.mit.edu (Symantec Messaging Gateway) with SMTP\r
+ id F4.3D.03273.6AA00F25; Mon, 3 Feb 2014 16:31:18 -0500 (EST)\r
+Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11])\r
+ by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id s13LVHQ1030024; \r
+ Mon, 3 Feb 2014 16:31:17 -0500\r
+Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91])\r
+ (authenticated bits=0)\r
+ (User authenticated as amdragon@ATHENA.MIT.EDU)\r
+ by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id s13LVFtI011538\r
+ (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT);\r
+ Mon, 3 Feb 2014 16:31:16 -0500\r
+Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80)\r
+ (envelope-from <amdragon@mit.edu>)\r
+ id 1WAR6o-0002eM-GA; Mon, 03 Feb 2014 16:31:14 -0500\r
+Date: Mon, 3 Feb 2014 16:31:13 -0500\r
+From: Austin Clements <amdragon@MIT.EDU>\r
+To: Jani Nikula <jani@nikula.org>\r
+Subject: Re: [PATCH v3 5/6] lib: replace the header parser with gmime\r
+Message-ID: <20140203213113.GM4375@mit.edu>\r
+References: <cover.1391456555.git.jani@nikula.org>\r
+ <bdef45dc21c777130a7ae1fa650f172d8c6eaaf4.1391456555.git.jani@nikula.org>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain; charset=us-ascii\r
+Content-Disposition: inline\r
+In-Reply-To:\r
+ <bdef45dc21c777130a7ae1fa650f172d8c6eaaf4.1391456555.git.jani@nikula.org>\r
+User-Agent: Mutt/1.5.21 (2010-09-15)\r
+X-Brightmail-Tracker:\r
+ H4sIAAAAAAAAA+NgFmpmleLIzCtJLcpLzFFi42IR4hTV1l3G9SHIYNISfoum6c4W12/OZHZg\r
+ 8rh1/zW7x7NVt5gDmKK4bFJSczLLUov07RK4MmaueM5YMHkOY8XVp6eZGhh/l3UxcnJICJhI\r
+ bPjUwgZhi0lcuLceyObiEBKYzSRx9MgyZpCEkMAGRomVXzghEqeYJA6u6GSFcJYwSvy63MEO\r
+ UsUioCKx5sRXsFFsAhoS2/YvZwSxRQQUJTaf3A9mMwtIS3z73cwEYgsLuEpceXYerJ5XQFvi\r
+ 46o2JohtdRJrT7QzQ8QFJU7OfMIC0aslcePfS6AaDrA5y/9xgIQ5BcIknjx4zApiiwKdMOXk\r
+ NrYJjEKzkHTPQtI9C6F7ASPzKkbZlNwq3dzEzJzi1GTd4uTEvLzUIl0zvdzMEr3UlNJNjKCw\r
+ ZndR3sH456DSIUYBDkYlHt6Ove+ChFgTy4orcw8xSnIwKYny6jN8CBLiS8pPqcxILM6ILyrN\r
+ SS0+xCjBwawkwuv36X2QEG9KYmVValE+TEqag0VJnDdxxpsgIYH0xJLU7NTUgtQimKwMB4eS\r
+ BG8mJ9BQwaLU9NSKtMycEoQ0EwcnyHAeoOHnOYBqeIsLEnOLM9Mh8qcYFaXEeQ1BmgVAEhml\r
+ eXC9sLTzilEc6BVhXhOQKh5gyoLrfgU0mAlo8DpXkKuLSxIRUlINjIaV8+ZWLf34Xb7vxfrX\r
+ X5fwO2yeO9dLlU3qeNPFvat4ZFImzFTcZHh6gs0TDu8dCtsY3Fy+rUvS7ewR2SLia37wDtOB\r
+ 409vJ7FuWx11vfMQg9WjbVHX9r1aHmv5wkHoG3/0iw15y39Mf5+gtO6mmcYXJX2eU5UVL0u/\r
+ 9k2q+H9x1vJbG1M+H+ZRYinOSDTUYi4qTgQA9MNbwBYDAAA=\r
+Cc: notmuch@notmuchmail.org\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Mon, 03 Feb 2014 21:31:35 -0000\r
+\r
+Quoth Jani Nikula on Feb 03 at 9:51 pm:\r
+> The notmuch library includes a full blown message header parser. Yet\r
+> the same message headers are parsed by gmime during indexing. Switch\r
+> to gmime parsing completely.\r
+> \r
+> These are the main differences between the parsers:\r
+> \r
+> * Gmime stops header parsing at the first invalid header, and presumes\r
+> the message body starts from there. The current parser is quite\r
+> liberal in accepting broken headers. The change means we will be\r
+> much pickier about accepting invalid messages. (Note that the\r
+> headers after the invalid header would not have been indexed before\r
+> this change anyway, as we use gmime for that.)\r
+> \r
+> * The current parser converts tabs used in header folding to\r
+> spaces. Gmime preserve the tabs. Due to a broken python library used\r
+> in mailman, there are plenty of mailing lists that produce headers\r
+> with tabs in header folding, and we'll see plenty of tabs. (This\r
+> change has been mitigated in preparatory patches.)\r
+> \r
+> * For pure header parsing, the current parser is likely faster than\r
+> gmime, which parses the whole message rather than just the\r
+> headers. Since we use gmime for indexing anyway, we can drop an\r
+> extra header parsing round (this is done in a follow-up patch).\r
+> \r
+> At this step, we only switch the header parsing to gmime.\r
+> ---\r
+> lib/database.cc | 4 +\r
+> lib/index.cc | 11 --\r
+> lib/message-file.c | 349 ++++++++++++++++----------------------------------\r
+> lib/message.cc | 6 +\r
+> lib/notmuch-private.h | 4 +\r
+> 5 files changed, 125 insertions(+), 249 deletions(-)\r
+> \r
+> diff --git a/lib/database.cc b/lib/database.cc\r
+> index f395061..d1bea88 100644\r
+> --- a/lib/database.cc\r
+> +++ b/lib/database.cc\r
+> @@ -1940,6 +1940,10 @@ notmuch_database_add_message (notmuch_database_t *notmuch,\r
+> "to",\r
+> (char *) NULL);\r
+> \r
+> + ret = notmuch_message_file_parse (message_file);\r
+> + if (ret)\r
+> + goto DONE;\r
+> +\r
+> try {\r
+> /* Before we do any real work, (especially before doing a\r
+> * potential SHA-1 computation on the entire file's contents),\r
+> diff --git a/lib/index.cc b/lib/index.cc\r
+> index 78c18cf..976e49f 100644\r
+> --- a/lib/index.cc\r
+> +++ b/lib/index.cc\r
+> @@ -437,7 +437,6 @@ _notmuch_message_index_file (notmuch_message_t *message,\r
+> static int initialized = 0;\r
+> char from_buf[5];\r
+> bool is_mbox = false;\r
+> - static bool mbox_warning = false;\r
+> \r
+> if (! initialized) {\r
+> g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS);\r
+> @@ -471,16 +470,6 @@ _notmuch_message_index_file (notmuch_message_t *message,\r
+> ret = NOTMUCH_STATUS_FILE_NOT_EMAIL;\r
+> goto DONE;\r
+> }\r
+> - /* For historical reasons, we support single-message mboxes,\r
+> - * but this behavior is likely to change in the future, so\r
+> - * warn. */\r
+> - if (!mbox_warning) {\r
+> - mbox_warning = true;\r
+> - fprintf (stderr, "\\r
+> -Warning: %s is an mbox containing a single message,\n\\r
+> -likely caused by misconfigured mail delivery. Support for single-message\n\\r
+> -mboxes is deprecated and may be removed in the future.\n", filename);\r
+> - }\r
+> }\r
+> \r
+> from = g_mime_message_get_sender (mime_message);\r
+> diff --git a/lib/message-file.c b/lib/message-file.c\r
+> index a2850c2..33f6468 100644\r
+> --- a/lib/message-file.c\r
+> +++ b/lib/message-file.c\r
+> @@ -26,30 +26,19 @@\r
+> \r
+> #include <glib.h> /* GHashTable */\r
+> \r
+> -typedef struct {\r
+> - char *str;\r
+> - size_t size;\r
+> - size_t len;\r
+> -} header_value_closure_t;\r
+> -\r
+> struct _notmuch_message_file {\r
+> /* File object */\r
+> FILE *file;\r
+> + char *filename;\r
+> \r
+> /* Header storage */\r
+> int restrict_headers;\r
+> GHashTable *headers;\r
+> - int broken_headers;\r
+> - int good_headers;\r
+> - size_t header_size; /* Length of full message header in bytes. */\r
+> -\r
+> - /* Parsing state */\r
+> - char *line;\r
+> - size_t line_size;\r
+> - header_value_closure_t value;\r
+> \r
+> - int parsing_started;\r
+> - int parsing_finished;\r
+> + GMimeStream *stream;\r
+> + GMimeParser *parser;\r
+> + GMimeMessage *message;\r
+> + notmuch_bool_t parsed;\r
+> };\r
+> \r
+> static int\r
+> @@ -76,16 +65,18 @@ strcase_hash (const void *ptr)\r
+> static int\r
+> _notmuch_message_file_destructor (notmuch_message_file_t *message)\r
+> {\r
+> - if (message->line)\r
+> - free (message->line);\r
+> -\r
+> - if (message->value.size)\r
+> - free (message->value.str);\r
+> -\r
+> if (message->headers)\r
+> g_hash_table_destroy (message->headers);\r
+> \r
+> - if (message->file)\r
+> + if (message->message)\r
+> + g_object_unref (message->message);\r
+> +\r
+> + if (message->parser)\r
+> + g_object_unref (message->parser);\r
+> +\r
+> + if (message->stream)\r
+> + g_object_unref (message->stream);\r
+> + else if (message->file) /* GMimeStream takes over the FILE* */\r
+> fclose (message->file);\r
+> \r
+> return 0;\r
+> @@ -102,19 +93,21 @@ _notmuch_message_file_open_ctx (void *ctx, const char *filename)\r
+> if (unlikely (message == NULL))\r
+> return NULL;\r
+> \r
+> + /* only needed for error messages during parsing */\r
+> + message->filename = talloc_strdup (message, filename);\r
+> + if (message->filename == NULL)\r
+> + goto FAIL;\r
+> +\r
+> talloc_set_destructor (message, _notmuch_message_file_destructor);\r
+> \r
+> message->file = fopen (filename, "r");\r
+> if (message->file == NULL)\r
+> goto FAIL;\r
+> \r
+> - message->headers = g_hash_table_new_full (strcase_hash,\r
+> - strcase_equal,\r
+> - free,\r
+> - g_free);\r
+> -\r
+> - message->parsing_started = 0;\r
+> - message->parsing_finished = 0;\r
+> + message->headers = g_hash_table_new_full (strcase_hash, strcase_equal,\r
+> + free, g_free);\r
+> + if (message->headers == NULL)\r
+> + goto FAIL;\r
+> \r
+> return message;\r
+> \r
+> @@ -143,7 +136,7 @@ notmuch_message_file_restrict_headersv (notmuch_message_file_t *message,\r
+> {\r
+> char *header;\r
+> \r
+> - if (message->parsing_started)\r
+> + if (message->parsed)\r
+> INTERNAL_ERROR ("notmuch_message_file_restrict_headers called after parsing has started");\r
+> \r
+> while (1) {\r
+> @@ -167,234 +160,114 @@ notmuch_message_file_restrict_headers (notmuch_message_file_t *message, ...)\r
+> notmuch_message_file_restrict_headersv (message, va_headers);\r
+> }\r
+> \r
+> -static void\r
+> -copy_header_unfolding (header_value_closure_t *value,\r
+> - const char *chunk)\r
+> +/*\r
+> + * gmime does not provide access to all Received: headers the way we\r
+> + * want, so we'll to use the parser header callback to gather them\r
+> + * into a hash table.\r
+> + */\r
+> +static void header_cb (unused(GMimeParser *parser), const char *header,\r
+> + const char *value, unused(gint64 offset),\r
+> + gpointer user_data)\r
+> {\r
+> - char *last;\r
+> + notmuch_message_file_t *message = (notmuch_message_file_t *) user_data;\r
+> + char *existing = NULL;\r
+> + notmuch_bool_t found;\r
+> \r
+> - if (chunk == NULL)\r
+> + found = g_hash_table_lookup_extended (message->headers, header,\r
+> + NULL, (gpointer *) &existing);\r
+> + if (! found && message->restrict_headers)\r
+> return;\r
+> \r
+> - while (*chunk == ' ' || *chunk == '\t')\r
+> - chunk++;\r
+> -\r
+> - if (value->len + 1 + strlen (chunk) + 1 > value->size) {\r
+> - unsigned int new_size = value->size;\r
+> - if (value->size == 0)\r
+> - new_size = strlen (chunk) + 1;\r
+> - else\r
+> - while (value->len + 1 + strlen (chunk) + 1 > new_size)\r
+> - new_size *= 2;\r
+> - value->str = xrealloc (value->str, new_size);\r
+> - value->size = new_size;\r
+> - }\r
+> -\r
+> - last = value->str + value->len;\r
+> - if (value->len) {\r
+> - *last = ' ';\r
+> - last++;\r
+> - value->len++;\r
+> - }\r
+> -\r
+> - strcpy (last, chunk);\r
+> - value->len += strlen (chunk);\r
+> -\r
+> - last = value->str + value->len - 1;\r
+> - if (*last == '\n') {\r
+> - *last = '\0';\r
+> - value->len--;\r
+> + if (existing == NULL) {\r
+> + /* No value, add one */\r
+> + char *decoded = g_mime_utils_header_decode_text (value);\r
+> + g_hash_table_insert (message->headers, xstrdup (header), decoded);\r
+> + } else if (strcasecmp (header, "received") == 0) {\r
+> + /* Concat all of the Received: headers we encounter. */\r
+> + char *combined, *decoded;\r
+> + size_t combined_size;\r
+> +\r
+> + decoded = g_mime_utils_header_decode_text (value);\r
+> +\r
+> + combined_size = strlen(existing) + strlen(decoded) + 2;\r
+> + combined = g_malloc (combined_size);\r
+> + snprintf (combined, combined_size, "%s %s", existing, decoded);\r
+> + g_free (decoded);\r
+> + g_hash_table_insert (message->headers, xstrdup (header), combined);\r
+\r
+I spent a while staring at this line trying to figure out whether\r
+header or its duplicate was leaked. It's clear enough that this line\r
+is correct from the documentation of g_hash_table_insert, but it might\r
+be worth adding a comment here mentioning that g_hash_table_insert\r
+will free this temporary copy of the key, to save future readers the\r
+eye strain.\r
+\r
+> }\r
+> }\r
+> \r
+> -/* As a special-case, a value of NULL for header_desired will force\r
+> - * the entire header to be parsed if it is not parsed already. This is\r
+> - * used by the _notmuch_message_file_get_headers_end function.\r
+> - * Another special case is the Received: header. For this header we\r
+> - * want to concatenate all instances of the header instead of just\r
+> - * hashing the first instance as we use this when analyzing the path\r
+> - * the mail has taken from sender to recipient.\r
+> - */\r
+> -const char *\r
+> -notmuch_message_file_get_header (notmuch_message_file_t *message,\r
+> - const char *header_desired)\r
+> +notmuch_status_t\r
+> +notmuch_message_file_parse (notmuch_message_file_t *message)\r
+> {\r
+> - int contains;\r
+> - char *header, *decoded_value, *header_sofar, *combined_header;\r
+> - const char *s, *colon;\r
+> - int match, newhdr, hdrsofar, is_received;\r
+> static int initialized = 0;\r
+> -\r
+> - is_received = (strcmp(header_desired,"received") == 0);\r
+> + char from_buf[5];\r
+> + notmuch_bool_t is_mbox = FALSE;\r
+> + static notmuch_bool_t mbox_warning = FALSE;\r
+> \r
+> if (! initialized) {\r
+> g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS);\r
+> initialized = 1;\r
+> }\r
+> \r
+> - message->parsing_started = 1;\r
+> -\r
+> - if (header_desired == NULL)\r
+> - contains = 0;\r
+> - else\r
+> - contains = g_hash_table_lookup_extended (message->headers,\r
+> - header_desired, NULL,\r
+> - (gpointer *) &decoded_value);\r
+> -\r
+> - if (contains && decoded_value)\r
+> - return decoded_value;\r
+> -\r
+> - if (message->parsing_finished)\r
+> - return "";\r
+> -\r
+> -#define NEXT_HEADER_LINE(closure) \\r
+> - while (1) { \\r
+> - ssize_t bytes_read = getline (&message->line, \\r
+> - &message->line_size, \\r
+> - message->file); \\r
+> - if (bytes_read == -1) { \\r
+> - message->parsing_finished = 1; \\r
+> - break; \\r
+> - } \\r
+> - if (*message->line == '\n') { \\r
+> - message->parsing_finished = 1; \\r
+> - break; \\r
+> - } \\r
+> - if (closure && \\r
+> - (*message->line == ' ' || *message->line == '\t')) \\r
+> - { \\r
+> - copy_header_unfolding ((closure), message->line); \\r
+> - } \\r
+> - if (*message->line == ' ' || *message->line == '\t') \\r
+> - message->header_size += strlen (message->line); \\r
+> - else \\r
+> - break; \\r
+> - }\r
+> -\r
+> - if (message->line == NULL)\r
+> - NEXT_HEADER_LINE (NULL);\r
+> -\r
+> - while (1) {\r
+> -\r
+> - if (message->parsing_finished)\r
+> - break;\r
+> -\r
+> - colon = strchr (message->line, ':');\r
+> -\r
+> - if (colon == NULL) {\r
+> - message->broken_headers++;\r
+> - /* A simple heuristic for giving up on things that just\r
+> - * don't look like mail messages. */\r
+> - if (message->broken_headers >= 10 &&\r
+> - message->good_headers < 5)\r
+> - {\r
+> - message->parsing_finished = 1;\r
+> - break;\r
+> - }\r
+> - NEXT_HEADER_LINE (NULL);\r
+> - continue;\r
+> + /* Is this mbox? */\r
+> + if (fread (from_buf, sizeof (from_buf), 1, message->file) == 1 &&\r
+> + strncmp (from_buf, "From ", 5) == 0)\r
+> + is_mbox = TRUE;\r
+> + rewind (message->file);\r
+> +\r
+> + message->stream = g_mime_stream_file_new (message->file);\r
+\r
+Given that GMimeStream steals message->file, would it make sense to\r
+set it to NULL here? That would also simplify\r
+_notmuch_message_file_destructor a bit and move the knowledge of this\r
+stealing behavior to where it happens.\r
+\r
+> + message->parser = g_mime_parser_new_with_stream (message->stream);\r
+> + g_mime_parser_set_scan_from (message->parser, is_mbox);\r
+> + g_mime_parser_set_header_regex (message->parser, ".*", header_cb,\r
+> + (gpointer) message);\r
+\r
+Interesting.\r
+\r
+> +\r
+> + message->message = g_mime_parser_construct_message (message->parser);\r
+> + if (! message->message)\r
+> + return NOTMUCH_STATUS_FILE_NOT_EMAIL;\r
+> +\r
+> + if (is_mbox) {\r
+> + if (! g_mime_parser_eos (message->parser)) {\r
+> + /* This is a multi-message mbox. */\r
+> + return NOTMUCH_STATUS_FILE_NOT_EMAIL;\r
+> }\r
+> -\r
+> - message->header_size += strlen (message->line);\r
+> -\r
+> - message->good_headers++;\r
+> -\r
+> - header = xstrndup (message->line, colon - message->line);\r
+> -\r
+> - if (message->restrict_headers &&\r
+> - ! g_hash_table_lookup_extended (message->headers,\r
+> - header, NULL, NULL))\r
+> - {\r
+> - free (header);\r
+> - NEXT_HEADER_LINE (NULL);\r
+> - continue;\r
+> - }\r
+> -\r
+> - s = colon + 1;\r
+> - while (*s == ' ' || *s == '\t')\r
+> - s++;\r
+> -\r
+> - message->value.len = 0;\r
+> - copy_header_unfolding (&message->value, s);\r
+> -\r
+> - NEXT_HEADER_LINE (&message->value);\r
+> -\r
+> - if (header_desired == NULL)\r
+> - match = 0;\r
+> - else\r
+> - match = (strcasecmp (header, header_desired) == 0);\r
+> -\r
+> - decoded_value = g_mime_utils_header_decode_text (message->value.str);\r
+> - header_sofar = (char *)g_hash_table_lookup (message->headers, header);\r
+> - /* we treat the Received: header special - we want to concat ALL of \r
+> - * the Received: headers we encounter.\r
+> - * for everything else we return the first instance of a header */\r
+> - if (strcasecmp(header, "received") == 0) {\r
+> - if (header_sofar == NULL) {\r
+> - /* first Received: header we encountered; just add it */\r
+> - g_hash_table_insert (message->headers, header, decoded_value);\r
+> - } else {\r
+> - /* we need to add the header to those we already collected */\r
+> - newhdr = strlen(decoded_value);\r
+> - hdrsofar = strlen(header_sofar);\r
+> - combined_header = g_malloc(hdrsofar + newhdr + 2);\r
+> - strncpy(combined_header,header_sofar,hdrsofar);\r
+> - *(combined_header+hdrsofar) = ' ';\r
+> - strncpy(combined_header+hdrsofar+1,decoded_value,newhdr+1);\r
+> - g_free (decoded_value);\r
+> - g_hash_table_insert (message->headers, header, combined_header);\r
+> - }\r
+> - } else {\r
+> - if (header_sofar == NULL) {\r
+> - /* Only insert if we don't have a value for this header, yet. */\r
+> - g_hash_table_insert (message->headers, header, decoded_value);\r
+> - } else {\r
+> - free (header);\r
+> - g_free (decoded_value);\r
+> - decoded_value = header_sofar;\r
+> - }\r
+> + /*\r
+> + * For historical reasons, we support single-message mboxes,\r
+> + * but this behavior is likely to change in the future, so\r
+> + * warn.\r
+> + */\r
+> + if (! mbox_warning) {\r
+> + mbox_warning = TRUE;\r
+> + fprintf (stderr, "\\r
+> +Warning: %s is an mbox containing a single message,\n\\r
+> +likely caused by misconfigured mail delivery. Support for single-message\n\\r
+> +mboxes is deprecated and may be removed in the future.\n", message->filename);\r
+> }\r
+\r
+Why move this warning here? Currently we obviously only print this\r
+once, at index time. Will this cause us to print the warning in more\r
+situations?\r
+\r
+> - /* if we found a match we can bail - unless of course we are\r
+> - * collecting all the Received: headers */\r
+> - if (match && !is_received)\r
+> - return decoded_value;\r
+> }\r
+> \r
+> - if (message->parsing_finished) {\r
+> - fclose (message->file);\r
+> - message->file = NULL;\r
+> - }\r
+> + message->parsed = TRUE;\r
+> \r
+> - if (message->line)\r
+> - free (message->line);\r
+> - message->line = NULL;\r
+> + return NOTMUCH_STATUS_SUCCESS;\r
+> +}\r
+> \r
+> - if (message->value.size) {\r
+> - free (message->value.str);\r
+> - message->value.str = NULL;\r
+> - message->value.size = 0;\r
+> - message->value.len = 0;\r
+> - }\r
+> +/* return NULL on errors, empty string for non-existing headers */\r
+\r
+For consistency, the comment should be capitalized.\r
+\r
+Also, this comment used to explain how Received headers were handled.\r
+It seems like we lost that.\r
+\r
+> +const char *\r
+> +notmuch_message_file_get_header (notmuch_message_file_t *message,\r
+> + const char *header)\r
+> +{\r
+> + const char *value = NULL;\r
+> + notmuch_bool_t found;\r
+> \r
+> - /* For the Received: header we actually might end up here even\r
+> - * though we found the header (as we force continued parsing\r
+> - * in that case). So let's check if that's the header we were\r
+> - * looking for and return the value that we found (if any)\r
+> - */\r
+> - if (is_received)\r
+> - return (char *)g_hash_table_lookup (message->headers, "received");\r
+> -\r
+> - /* We've parsed all headers and never found the one we're looking\r
+> - * for. It's probably just not there, but let's check that we\r
+> - * didn't make a mistake preventing us from seeing it. */\r
+> - if (message->restrict_headers && header_desired &&\r
+> - ! g_hash_table_lookup_extended (message->headers,\r
+> - header_desired, NULL, NULL))\r
+> - {\r
+> - INTERNAL_ERROR ("Attempt to get header \"%s\" which was not\n"\r
+> - "included in call to notmuch_message_file_restrict_headers\n",\r
+> - header_desired);\r
+> - }\r
+> + /* the caller shouldn't ask for headers before parsing */\r
+> + if (! message->parsed)\r
+> + return NULL;\r
+\r
+Previously, calling notmuch_message_file_get_header triggered parsing\r
+of the message, which ensured it didn't happen unless necessary. Is\r
+there a reason to change that behavior?\r
+\r
+If it were still done lazily, you wouldn't need the addition below to\r
+_notmuch_message_ensure_message_file, or the new prototype in\r
+notmuch-private.h (and notmuch_message_file_parse could be static).\r
+\r
+(Ignore this comment if it's justified by the next patch; I haven't\r
+gotten there yet.)\r
+\r
+> +\r
+> + found = g_hash_table_lookup_extended (message->headers, header,\r
+> + NULL, (gpointer *) &value);\r
+> +\r
+> + /* the caller shouldn't ask for non-restricted headers */\r
+> + if (! found && message->restrict_headers)\r
+> + return NULL;\r
+> \r
+> - return "";\r
+> + return value ? value : "";\r
+> }\r
+> diff --git a/lib/message.cc b/lib/message.cc\r
+> index c91f3a5..9a22d36 100644\r
+> --- a/lib/message.cc\r
+> +++ b/lib/message.cc\r
+> @@ -407,6 +407,12 @@ _notmuch_message_ensure_message_file (notmuch_message_t *message)\r
+> return;\r
+> \r
+> message->message_file = _notmuch_message_file_open_ctx (message, filename);\r
+> +\r
+> + /* XXX: better return value handling */\r
+> + if (notmuch_message_file_parse (message->message_file)) {\r
+> + notmuch_message_file_close (message->message_file);\r
+> + message->message_file = NULL;\r
+> + }\r
+> }\r
+> \r
+> const char *\r
+> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h\r
+> index af185c7..7277df1 100644\r
+> --- a/lib/notmuch-private.h\r
+> +++ b/lib/notmuch-private.h\r
+> @@ -345,6 +345,10 @@ notmuch_message_file_open (const char *filename);\r
+> notmuch_message_file_t *\r
+> _notmuch_message_file_open_ctx (void *ctx, const char *filename);\r
+> \r
+> +/* Parse the message */\r
+> +notmuch_status_t\r
+> +notmuch_message_file_parse (notmuch_message_file_t *message);\r
+> +\r
+> /* Close a notmuch message previously opened with notmuch_message_open. */\r
+> void\r
+> notmuch_message_file_close (notmuch_message_file_t *message);\r