--- /dev/null
+Return-Path: <jani@nikula.org>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by olra.theworths.org (Postfix) with ESMTP id 1DD20431FB6\r
+ for <notmuch@notmuchmail.org>; Sun, 30 Mar 2014 14:22:12 -0700 (PDT)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Amavis-Alert: BAD HEADER SECTION, Duplicate header field: "References"\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.7\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5\r
+ tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+ by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id qVpb9yxdmDwK for <notmuch@notmuchmail.org>;\r
+ Sun, 30 Mar 2014 14:22:04 -0700 (PDT)\r
+Received: from mail-we0-f171.google.com (mail-we0-f171.google.com\r
+ [74.125.82.171]) (using TLSv1 with cipher RC4-SHA (128/128 bits))\r
+ (No client certificate requested)\r
+ by olra.theworths.org (Postfix) with ESMTPS id D49AC431FBD\r
+ for <notmuch@notmuchmail.org>; Sun, 30 Mar 2014 14:22:01 -0700 (PDT)\r
+Received: by mail-we0-f171.google.com with SMTP id t61so3933059wes.2\r
+ for <notmuch@notmuchmail.org>; Sun, 30 Mar 2014 14:21:59 -0700 (PDT)\r
+X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\r
+ d=1e100.net; s=20130820;\r
+ h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to\r
+ :references:in-reply-to:references;\r
+ bh=gpYa1xGSwVynsJAmM0iTnq0k963EAPMRD98BNuyw2ps=;\r
+ b=K4nGW6VNbSL9Un63zw4+H0V8f8tkupfDZRvoazEBGHtgJBHSHkDKN45OkyDk/1ptj5\r
+ JuN0SypbWUKN+JGoqgZzk0C8+XYvnrKT88J1j5fi++MtGnn1Ef5jW5xuxIBammrW8cIr\r
+ /NHSi/kX31m1WGzNjB6Kcy5L2RJ08HZ4ve5pXiWn634wYtiUMnlVe//2tFcKdHeYx0ls\r
+ CLHM7+zvdXcm9O92L87st6nE9zUL64bi+TWOZBb8P5F2Agj2hoDs4y5eNKhanIp/oB0Y\r
+ g9dwwkEI316CUVjBcSBPlHRanbPyAH3fA4yoVCsmSNlRC4D/Cth3hoGw+jg1ghX5j1by\r
+ MH3w==\r
+X-Gm-Message-State:\r
+ ALoCoQkHQ3kOM7Zo44Z3D77LrbRZq+iNHVaifhSpBelS8ErukSKExab9iJBp1Zv8UGoWzXww7+ub\r
+X-Received: by 10.194.203.200 with SMTP id ks8mr721422wjc.61.1396214519282;\r
+ Sun, 30 Mar 2014 14:21:59 -0700 (PDT)\r
+Received: from localhost (dsl-hkibrasgw2-58c36f-91.dhcp.inet.fi.\r
+ [88.195.111.91]) by mx.google.com with ESMTPSA id\r
+ m44sm28343726eep.14.2014.03.30.14.21.57 for <multiple recipients>\r
+ (version=TLSv1.2 cipher=RC4-SHA bits=128/128);\r
+ Sun, 30 Mar 2014 14:21:58 -0700 (PDT)\r
+From: Jani Nikula <jani@nikula.org>\r
+To: notmuch@notmuchmail.org\r
+Subject: [PATCH v5 2/2] lib: replace the header parser with gmime\r
+Date: Mon, 31 Mar 2014 00:21:49 +0300\r
+Message-Id:\r
+ <572519ca9c95b686f30af99e49db86f51ede30e2.1396214154.git.jani@nikula.org>\r
+X-Mailer: git-send-email 1.9.0\r
+In-Reply-To: <cover.1396214154.git.jani@nikula.org>\r
+References: <cover.1396214154.git.jani@nikula.org>\r
+In-Reply-To: <cover.1396214154.git.jani@nikula.org>\r
+References: <cover.1396214154.git.jani@nikula.org>\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Sun, 30 Mar 2014 21:22:12 -0000\r
+\r
+The notmuch library includes a full blown message header parser. Yet\r
+the same message headers are parsed by gmime during indexing. Switch\r
+to gmime parsing completely.\r
+\r
+These are the main changes:\r
+\r
+* Gmime stops header parsing at the first invalid header, and presumes\r
+ the message body starts from there. The current parser is quite\r
+ liberal in accepting broken headers. The change means we will be\r
+ much pickier about accepting invalid messages.\r
+\r
+* The current parser converts tabs used in header folding to\r
+ spaces. Gmime preserve the tabs. Due to a broken python library used\r
+ in mailman, there are plenty of mailing lists that produce headers\r
+ with tabs in header folding, and we'll see plenty of tabs. (This\r
+ change has been mitigated in preparatory patches.)\r
+\r
+* For pure header parsing, the current parser is likely faster than\r
+ gmime, which parses the whole message rather than just the\r
+ headers. Since we parse the message and its headers using gmime for\r
+ indexing anyway, this avoids and extra header parsing round when\r
+ adding new messages. In case of duplicate messages, we'll end up\r
+ parsing the full message although just headers would be\r
+ sufficient. All in all this should still speed up 'notmuch new'.\r
+\r
+* Calls to notmuch_message_get_header() may be slightly slower than\r
+ previously for headers that are not indexed in the database, due to\r
+ parsing of the whole message. Within the notmuch code base, notmuch\r
+ reply is the only such user.\r
+\r
+---\r
+\r
+This is v4 of patches [1] and [2], combining them into one. Patch [3]\r
+is required to make all the tests pass.\r
+\r
+The implementation has been changed considerably. Notably we only\r
+cache the decoded headers to a hash table if and when they are\r
+needed. The main reasons for doing caching at all is that gmime\r
+provides raw headers that need decoding, and to minimize changes\r
+outside of message-file.c we want to own and track the allocated\r
+strings instead of pushing the responsibility to callers.\r
+\r
+This addresses most comments from Austin.\r
+\r
+The diff in message-file.c is rather confusing due to a mix of\r
+added/removed lines. Looking at the patched file is much more\r
+pleasant.\r
+\r
+BR,\r
+Jani.\r
+\r
+[1] id:bdef45dc21c777130a7ae1fa650f172d8c6eaaf4.1391456555.git.jani@nikula.org\r
+[2] id:31d785c4a3e4b90862a0fdc545d4e900a4c898e2.1391456555.git.jani@nikula.org\r
+[3] id:1395247490-19892-1-git-send-email-jani@nikula.org\r
+---\r
+ lib/database.cc | 15 +-\r
+ lib/index.cc | 54 +------\r
+ lib/message-file.c | 413 ++++++++++++++++++++------------------------------\r
+ lib/notmuch-private.h | 55 +++----\r
+ 4 files changed, 206 insertions(+), 331 deletions(-)\r
+\r
+diff --git a/lib/database.cc b/lib/database.cc\r
+index aef748f75b5d..1efb14d4a0bd 100644\r
+--- a/lib/database.cc\r
++++ b/lib/database.cc\r
+@@ -1971,15 +1971,10 @@ notmuch_database_add_message (notmuch_database_t *notmuch,\r
+ if (ret)\r
+ goto DONE;\r
+ \r
+- notmuch_message_file_restrict_headers (message_file,\r
+- "date",\r
+- "from",\r
+- "in-reply-to",\r
+- "message-id",\r
+- "references",\r
+- "subject",\r
+- "to",\r
+- (char *) NULL);\r
++ /* Parse message up front to get better error status. */\r
++ ret = _notmuch_message_file_parse (message_file);\r
++ if (ret)\r
++ goto DONE;\r
+ \r
+ try {\r
+ /* Before we do any real work, (especially before doing a\r
+@@ -2066,7 +2061,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch,\r
+ date = notmuch_message_file_get_header (message_file, "date");\r
+ _notmuch_message_set_header_values (message, date, from, subject);\r
+ \r
+- ret = _notmuch_message_index_file (message, filename);\r
++ ret = _notmuch_message_index_file (message, message_file);\r
+ if (ret)\r
+ goto DONE;\r
+ } else {\r
+diff --git a/lib/index.cc b/lib/index.cc\r
+index 2100cce04780..e1e2a3828f02 100644\r
+--- a/lib/index.cc\r
++++ b/lib/index.cc\r
+@@ -425,45 +425,17 @@ _index_mime_part (notmuch_message_t *message,\r
+ \r
+ notmuch_status_t\r
+ _notmuch_message_index_file (notmuch_message_t *message,\r
+- const char *filename)\r
++ notmuch_message_file_t *message_file)\r
+ {\r
+- GMimeStream *stream = NULL;\r
+- GMimeParser *parser = NULL;\r
+- GMimeMessage *mime_message = NULL;\r
++ GMimeMessage *mime_message;\r
+ InternetAddressList *addresses;\r
+- FILE *file = NULL;\r
+ const char *from, *subject;\r
+- notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;\r
+- static int initialized = 0;\r
+- char from_buf[5];\r
++ notmuch_status_t status;\r
+ \r
+- if (! initialized) {\r
+- g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS);\r
+- initialized = 1;\r
+- }\r
+-\r
+- file = fopen (filename, "r");\r
+- if (! file) {\r
+- fprintf (stderr, "Error opening %s: %s\n", filename, strerror (errno));\r
+- ret = NOTMUCH_STATUS_FILE_ERROR;\r
+- goto DONE;\r
+- }\r
+-\r
+- /* Is this mbox? */\r
+- if (fread (from_buf, sizeof (from_buf), 1, file) == 1 &&\r
+- strncmp (from_buf, "From ", 5) == 0) {\r
+- ret = NOTMUCH_STATUS_FILE_NOT_EMAIL;\r
+- goto DONE;\r
+- }\r
+- rewind (file);\r
+-\r
+- /* Evil GMime steals my FILE* here so I won't fclose it. */\r
+- stream = g_mime_stream_file_new (file);\r
+-\r
+- parser = g_mime_parser_new_with_stream (stream);\r
+- g_mime_parser_set_scan_from (parser, FALSE);\r
+-\r
+- mime_message = g_mime_parser_construct_message (parser);\r
++ status = _notmuch_message_file_get_mime_message (message_file,\r
++ &mime_message);\r
++ if (status)\r
++ return status;\r
+ \r
+ from = g_mime_message_get_sender (mime_message);\r
+ \r
+@@ -484,15 +456,5 @@ _notmuch_message_index_file (notmuch_message_t *message,\r
+ \r
+ _index_mime_part (message, g_mime_message_get_mime_part (mime_message));\r
+ \r
+- DONE:\r
+- if (mime_message)\r
+- g_object_unref (mime_message);\r
+-\r
+- if (parser)\r
+- g_object_unref (parser);\r
+-\r
+- if (stream)\r
+- g_object_unref (stream);\r
+-\r
+- return ret;\r
++ return NOTMUCH_STATUS_SUCCESS;\r
+ }\r
+diff --git a/lib/message-file.c b/lib/message-file.c\r
+index a2850c278b5a..67828827e61d 100644\r
+--- a/lib/message-file.c\r
++++ b/lib/message-file.c\r
+@@ -26,30 +26,15 @@\r
+ \r
+ #include <glib.h> /* GHashTable */\r
+ \r
+-typedef struct {\r
+- char *str;\r
+- size_t size;\r
+- size_t len;\r
+-} header_value_closure_t;\r
+-\r
+ struct _notmuch_message_file {\r
+ /* File object */\r
+ FILE *file;\r
++ char *filename;\r
+ \r
+- /* Header storage */\r
+- int restrict_headers;\r
++ /* Cache for decoded headers */\r
+ GHashTable *headers;\r
+- int broken_headers;\r
+- int good_headers;\r
+- size_t header_size; /* Length of full message header in bytes. */\r
+-\r
+- /* Parsing state */\r
+- char *line;\r
+- size_t line_size;\r
+- header_value_closure_t value;\r
+ \r
+- int parsing_started;\r
+- int parsing_finished;\r
++ GMimeMessage *message;\r
+ };\r
+ \r
+ static int\r
+@@ -76,15 +61,12 @@ strcase_hash (const void *ptr)\r
+ static int\r
+ _notmuch_message_file_destructor (notmuch_message_file_t *message)\r
+ {\r
+- if (message->line)\r
+- free (message->line);\r
+-\r
+- if (message->value.size)\r
+- free (message->value.str);\r
+-\r
+ if (message->headers)\r
+ g_hash_table_destroy (message->headers);\r
+ \r
++ if (message->message)\r
++ g_object_unref (message->message);\r
++\r
+ if (message->file)\r
+ fclose (message->file);\r
+ \r
+@@ -102,20 +84,17 @@ _notmuch_message_file_open_ctx (void *ctx, const char *filename)\r
+ if (unlikely (message == NULL))\r
+ return NULL;\r
+ \r
++ /* Only needed for error messages during parsing. */\r
++ message->filename = talloc_strdup (message, filename);\r
++ if (message->filename == NULL)\r
++ goto FAIL;\r
++\r
+ talloc_set_destructor (message, _notmuch_message_file_destructor);\r
+ \r
+ message->file = fopen (filename, "r");\r
+ if (message->file == NULL)\r
+ goto FAIL;\r
+ \r
+- message->headers = g_hash_table_new_full (strcase_hash,\r
+- strcase_equal,\r
+- free,\r
+- g_free);\r
+-\r
+- message->parsing_started = 0;\r
+- message->parsing_finished = 0;\r
+-\r
+ return message;\r
+ \r
+ FAIL:\r
+@@ -137,264 +116,202 @@ notmuch_message_file_close (notmuch_message_file_t *message)\r
+ talloc_free (message);\r
+ }\r
+ \r
+-void\r
+-notmuch_message_file_restrict_headersv (notmuch_message_file_t *message,\r
+- va_list va_headers)\r
++static notmuch_bool_t\r
++is_mbox (FILE *file)\r
+ {\r
+- char *header;\r
++ char from_buf[5];\r
++ notmuch_bool_t ret = FALSE;\r
+ \r
+- if (message->parsing_started)\r
+- INTERNAL_ERROR ("notmuch_message_file_restrict_headers called after parsing has started");\r
++ /* Is this mbox? */\r
++ if (fread (from_buf, sizeof (from_buf), 1, file) == 1 &&\r
++ strncmp (from_buf, "From ", 5) == 0)\r
++ ret = TRUE;\r
+ \r
+- while (1) {\r
+- header = va_arg (va_headers, char*);\r
+- if (header == NULL)\r
+- break;\r
+- g_hash_table_insert (message->headers,\r
+- xstrdup (header), NULL);\r
+- }\r
++ rewind (file);\r
+ \r
+- message->restrict_headers = 1;\r
++ return ret;\r
+ }\r
+ \r
+-void\r
+-notmuch_message_file_restrict_headers (notmuch_message_file_t *message, ...)\r
++notmuch_status_t\r
++_notmuch_message_file_parse (notmuch_message_file_t *message)\r
+ {\r
+- va_list va_headers;\r
++ GMimeStream *stream;\r
++ GMimeParser *parser;\r
++ notmuch_status_t status = NOTMUCH_STATUS_SUCCESS;\r
++ static int initialized = 0;\r
+ \r
+- va_start (va_headers, message);\r
++ if (message->message)\r
++ return NOTMUCH_STATUS_SUCCESS;\r
+ \r
+- notmuch_message_file_restrict_headersv (message, va_headers);\r
+-}\r
++ /* We no longer support mboxes at all. */\r
++ if (is_mbox (message->file))\r
++ return NOTMUCH_STATUS_FILE_NOT_EMAIL;\r
+ \r
+-static void\r
+-copy_header_unfolding (header_value_closure_t *value,\r
+- const char *chunk)\r
+-{\r
+- char *last;\r
++ if (! initialized) {\r
++ g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS);\r
++ initialized = 1;\r
++ }\r
+ \r
+- if (chunk == NULL)\r
+- return;\r
++ message->headers = g_hash_table_new_full (strcase_hash, strcase_equal,\r
++ free, g_free);\r
++ if (! message->headers)\r
++ return NOTMUCH_STATUS_OUT_OF_MEMORY;\r
+ \r
+- while (*chunk == ' ' || *chunk == '\t')\r
+- chunk++;\r
++ stream = g_mime_stream_file_new (message->file);\r
+ \r
+- if (value->len + 1 + strlen (chunk) + 1 > value->size) {\r
+- unsigned int new_size = value->size;\r
+- if (value->size == 0)\r
+- new_size = strlen (chunk) + 1;\r
+- else\r
+- while (value->len + 1 + strlen (chunk) + 1 > new_size)\r
+- new_size *= 2;\r
+- value->str = xrealloc (value->str, new_size);\r
+- value->size = new_size;\r
+- }\r
++ /* We'll own and fclose the FILE* ourselves. */\r
++ g_mime_stream_file_set_owner (GMIME_STREAM_FILE (stream), FALSE);\r
++\r
++ parser = g_mime_parser_new_with_stream (stream);\r
++ g_mime_parser_set_scan_from (parser, FALSE);\r
+ \r
+- last = value->str + value->len;\r
+- if (value->len) {\r
+- *last = ' ';\r
+- last++;\r
+- value->len++;\r
++ message->message = g_mime_parser_construct_message (parser);\r
++ if (! message->message) {\r
++ status = NOTMUCH_STATUS_FILE_NOT_EMAIL;\r
++ goto DONE;\r
+ }\r
+ \r
+- strcpy (last, chunk);\r
+- value->len += strlen (chunk);\r
++ DONE:\r
++ g_object_unref (stream);\r
++ g_object_unref (parser);\r
++\r
++ if (status) {\r
++ g_hash_table_destroy (message->headers);\r
++ message->headers = NULL;\r
++\r
++ if (message->message) {\r
++ g_object_unref (message->message);\r
++ message->message = NULL;\r
++ }\r
+ \r
+- last = value->str + value->len - 1;\r
+- if (*last == '\n') {\r
+- *last = '\0';\r
+- value->len--;\r
++ rewind (message->file);\r
+ }\r
++\r
++ return status;\r
+ }\r
+ \r
+-/* As a special-case, a value of NULL for header_desired will force\r
+- * the entire header to be parsed if it is not parsed already. This is\r
+- * used by the _notmuch_message_file_get_headers_end function.\r
+- * Another special case is the Received: header. For this header we\r
+- * want to concatenate all instances of the header instead of just\r
+- * hashing the first instance as we use this when analyzing the path\r
+- * the mail has taken from sender to recipient.\r
+- */\r
+-const char *\r
+-notmuch_message_file_get_header (notmuch_message_file_t *message,\r
+- const char *header_desired)\r
++notmuch_status_t\r
++_notmuch_message_file_get_mime_message (notmuch_message_file_t *message,\r
++ GMimeMessage **mime_message)\r
+ {\r
+- int contains;\r
+- char *header, *decoded_value, *header_sofar, *combined_header;\r
+- const char *s, *colon;\r
+- int match, newhdr, hdrsofar, is_received;\r
+- static int initialized = 0;\r
++ notmuch_status_t status;\r
+ \r
+- is_received = (strcmp(header_desired,"received") == 0);\r
++ status = _notmuch_message_file_parse (message);\r
++ if (status)\r
++ return status;\r
+ \r
+- if (! initialized) {\r
+- g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS);\r
+- initialized = 1;\r
+- }\r
++ *mime_message = message->message;\r
+ \r
+- message->parsing_started = 1;\r
+-\r
+- if (header_desired == NULL)\r
+- contains = 0;\r
+- else\r
+- contains = g_hash_table_lookup_extended (message->headers,\r
+- header_desired, NULL,\r
+- (gpointer *) &decoded_value);\r
+-\r
+- if (contains && decoded_value)\r
+- return decoded_value;\r
+-\r
+- if (message->parsing_finished)\r
+- return "";\r
+-\r
+-#define NEXT_HEADER_LINE(closure) \\r
+- while (1) { \\r
+- ssize_t bytes_read = getline (&message->line, \\r
+- &message->line_size, \\r
+- message->file); \\r
+- if (bytes_read == -1) { \\r
+- message->parsing_finished = 1; \\r
+- break; \\r
+- } \\r
+- if (*message->line == '\n') { \\r
+- message->parsing_finished = 1; \\r
+- break; \\r
+- } \\r
+- if (closure && \\r
+- (*message->line == ' ' || *message->line == '\t')) \\r
+- { \\r
+- copy_header_unfolding ((closure), message->line); \\r
+- } \\r
+- if (*message->line == ' ' || *message->line == '\t') \\r
+- message->header_size += strlen (message->line); \\r
+- else \\r
+- break; \\r
+- }\r
++ return NOTMUCH_STATUS_SUCCESS;\r
++}\r
+ \r
+- if (message->line == NULL)\r
+- NEXT_HEADER_LINE (NULL);\r
++/*\r
++ * Get all instances of a header decoded and concatenated.\r
++ *\r
++ * The result must be freed using g_free().\r
++ *\r
++ * Return NULL on errors, empty string for non-existing headers.\r
++ */\r
++static char *\r
++_notmuch_message_file_get_combined_header (notmuch_message_file_t *message,\r
++ const char *header)\r
++{\r
++ GMimeHeaderList *headers;\r
++ GMimeHeaderIter *iter;\r
++ char *combined = NULL;\r
+ \r
+- while (1) {\r
++ headers = g_mime_object_get_header_list (GMIME_OBJECT (message->message));\r
++ if (! headers)\r
++ return NULL;\r
+ \r
+- if (message->parsing_finished)\r
+- break;\r
++ iter = g_mime_header_iter_new ();\r
++ if (! iter)\r
++ return NULL;\r
+ \r
+- colon = strchr (message->line, ':');\r
++ if (! g_mime_header_list_get_iter (headers, iter))\r
++ goto DONE;\r
+ \r
+- if (colon == NULL) {\r
+- message->broken_headers++;\r
+- /* A simple heuristic for giving up on things that just\r
+- * don't look like mail messages. */\r
+- if (message->broken_headers >= 10 &&\r
+- message->good_headers < 5)\r
+- {\r
+- message->parsing_finished = 1;\r
+- break;\r
+- }\r
+- NEXT_HEADER_LINE (NULL);\r
++ do {\r
++ const char *value;\r
++ char *decoded;\r
++\r
++ if (strcasecmp (g_mime_header_iter_get_name (iter), header) != 0)\r
+ continue;\r
+- }\r
+ \r
+- message->header_size += strlen (message->line);\r
++ /* Note that GMime retains ownership of value... */\r
++ value = g_mime_header_iter_get_value (iter);\r
+ \r
+- message->good_headers++;\r
++ /* ... while decoded needs to be freed with g_free(). */\r
++ decoded = g_mime_utils_header_decode_text (value);\r
++ if (! decoded) {\r
++ if (combined) {\r
++ g_free (combined);\r
++ combined = NULL;\r
++ }\r
++ goto DONE;\r
++ }\r
+ \r
+- header = xstrndup (message->line, colon - message->line);\r
++ if (combined) {\r
++ char *tmp = g_strdup_printf ("%s %s", combined, decoded);\r
++ g_free (decoded);\r
++ g_free (combined);\r
++ if (! tmp) {\r
++ combined = NULL;\r
++ goto DONE;\r
++ }\r
+ \r
+- if (message->restrict_headers &&\r
+- ! g_hash_table_lookup_extended (message->headers,\r
+- header, NULL, NULL))\r
+- {\r
+- free (header);\r
+- NEXT_HEADER_LINE (NULL);\r
+- continue;\r
++ combined = tmp;\r
++ } else {\r
++ combined = decoded;\r
+ }\r
++ } while (g_mime_header_iter_next (iter));\r
+ \r
+- s = colon + 1;\r
+- while (*s == ' ' || *s == '\t')\r
+- s++;\r
++ /* Return empty string for non-existing headers. */\r
++ if (! combined)\r
++ combined = g_strdup ("");\r
+ \r
+- message->value.len = 0;\r
+- copy_header_unfolding (&message->value, s);\r
++ DONE:\r
++ g_mime_header_iter_free (iter);\r
+ \r
+- NEXT_HEADER_LINE (&message->value);\r
+-\r
+- if (header_desired == NULL)\r
+- match = 0;\r
+- else\r
+- match = (strcasecmp (header, header_desired) == 0);\r
+-\r
+- decoded_value = g_mime_utils_header_decode_text (message->value.str);\r
+- header_sofar = (char *)g_hash_table_lookup (message->headers, header);\r
+- /* we treat the Received: header special - we want to concat ALL of \r
+- * the Received: headers we encounter.\r
+- * for everything else we return the first instance of a header */\r
+- if (strcasecmp(header, "received") == 0) {\r
+- if (header_sofar == NULL) {\r
+- /* first Received: header we encountered; just add it */\r
+- g_hash_table_insert (message->headers, header, decoded_value);\r
+- } else {\r
+- /* we need to add the header to those we already collected */\r
+- newhdr = strlen(decoded_value);\r
+- hdrsofar = strlen(header_sofar);\r
+- combined_header = g_malloc(hdrsofar + newhdr + 2);\r
+- strncpy(combined_header,header_sofar,hdrsofar);\r
+- *(combined_header+hdrsofar) = ' ';\r
+- strncpy(combined_header+hdrsofar+1,decoded_value,newhdr+1);\r
+- g_free (decoded_value);\r
+- g_hash_table_insert (message->headers, header, combined_header);\r
+- }\r
+- } else {\r
+- if (header_sofar == NULL) {\r
+- /* Only insert if we don't have a value for this header, yet. */\r
+- g_hash_table_insert (message->headers, header, decoded_value);\r
+- } else {\r
+- free (header);\r
+- g_free (decoded_value);\r
+- decoded_value = header_sofar;\r
+- }\r
+- }\r
+- /* if we found a match we can bail - unless of course we are\r
+- * collecting all the Received: headers */\r
+- if (match && !is_received)\r
+- return decoded_value;\r
+- }\r
++ return combined;\r
++}\r
+ \r
+- if (message->parsing_finished) {\r
+- fclose (message->file);\r
+- message->file = NULL;\r
+- }\r
++const char *\r
++notmuch_message_file_get_header (notmuch_message_file_t *message,\r
++ const char *header)\r
++{\r
++ const char *value;\r
++ char *decoded;\r
+ \r
+- if (message->line)\r
+- free (message->line);\r
+- message->line = NULL;\r
++ if (_notmuch_message_file_parse (message))\r
++ return NULL;\r
+ \r
+- if (message->value.size) {\r
+- free (message->value.str);\r
+- message->value.str = NULL;\r
+- message->value.size = 0;\r
+- message->value.len = 0;\r
++ /* If we have a cached decoded value, use it. */\r
++ value = g_hash_table_lookup (message->headers, header);\r
++ if (value)\r
++ return value;\r
++\r
++ if (strcasecmp (header, "received") == 0) {\r
++ /*\r
++ * The Received: header is special. We concatenate all\r
++ * instances of the header as we use this when analyzing the\r
++ * path the mail has taken from sender to recipient.\r
++ */\r
++ decoded = _notmuch_message_file_get_combined_header (message, header);\r
++ } else {\r
++ value = g_mime_object_get_header (GMIME_OBJECT (message->message),\r
++ header);\r
++ if (value)\r
++ decoded = g_mime_utils_header_decode_text (value);\r
++ else\r
++ decoded = g_strdup ("");\r
+ }\r
+ \r
+- /* For the Received: header we actually might end up here even\r
+- * though we found the header (as we force continued parsing\r
+- * in that case). So let's check if that's the header we were\r
+- * looking for and return the value that we found (if any)\r
+- */\r
+- if (is_received)\r
+- return (char *)g_hash_table_lookup (message->headers, "received");\r
+-\r
+- /* We've parsed all headers and never found the one we're looking\r
+- * for. It's probably just not there, but let's check that we\r
+- * didn't make a mistake preventing us from seeing it. */\r
+- if (message->restrict_headers && header_desired &&\r
+- ! g_hash_table_lookup_extended (message->headers,\r
+- header_desired, NULL, NULL))\r
+- {\r
+- INTERNAL_ERROR ("Attempt to get header \"%s\" which was not\n"\r
+- "included in call to notmuch_message_file_restrict_headers\n",\r
+- header_desired);\r
+- }\r
++ if (! decoded)\r
++ return NULL;\r
++\r
++ /* Cache the decoded value. We also own the strings. */\r
++ g_hash_table_insert (message->headers, xstrdup (header), decoded);\r
+ \r
+- return "";\r
++ return decoded;\r
+ }\r
+diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h\r
+index 59eb2bc285a5..703ae7bb7a01 100644\r
+--- a/lib/notmuch-private.h\r
++++ b/lib/notmuch-private.h\r
+@@ -46,6 +46,8 @@ NOTMUCH_BEGIN_DECLS\r
+ \r
+ #include <talloc.h>\r
+ \r
++#include <gmime/gmime.h>\r
++\r
+ #include "xutil.h"\r
+ #include "error_util.h"\r
+ \r
+@@ -320,13 +322,6 @@ notmuch_message_set_author (notmuch_message_t *message, const char *author);\r
+ const char *\r
+ notmuch_message_get_author (notmuch_message_t *message);\r
+ \r
+-\r
+-/* index.cc */\r
+-\r
+-notmuch_status_t\r
+-_notmuch_message_index_file (notmuch_message_t *message,\r
+- const char *filename);\r
+-\r
+ /* message-file.c */\r
+ \r
+ /* XXX: I haven't decided yet whether these will actually get exported\r
+@@ -352,32 +347,32 @@ _notmuch_message_file_open_ctx (void *ctx, const char *filename);\r
+ void\r
+ notmuch_message_file_close (notmuch_message_file_t *message);\r
+ \r
+-/* Restrict 'message' to only save the named headers.\r
++/* Parse the message.\r
+ *\r
+- * When the caller is only interested in a short list of headers,\r
+- * known in advance, calling this function can avoid wasted time and\r
+- * memory parsing/saving header values that will never be needed.\r
++ * This will be done automatically as necessary on other calls\r
++ * depending on it, but an explicit call allows for better error\r
++ * status reporting.\r
++ */\r
++notmuch_status_t\r
++_notmuch_message_file_parse (notmuch_message_file_t *message);\r
++\r
++/* Get the gmime message of a message file.\r
+ *\r
+- * The variable arguments should be a list of const char * with a\r
+- * final '(const char *) NULL' to terminate the list.\r
++ * The message file is parsed as necessary.\r
+ *\r
+- * If this function is called, it must be called before any calls to\r
+- * notmuch_message_get_header for this message.\r
++ * The GMimeMessage* is set to *mime_message on success (which the\r
++ * caller must not unref).\r
+ *\r
+- * After calling this function, if notmuch_message_get_header is\r
+- * called with a header name not in this list, then NULL will be\r
+- * returned even if that header exists in the actual message.\r
++ * XXX: Would be nice to not have to expose GMimeMessage here.\r
+ */\r
+-void\r
+-notmuch_message_file_restrict_headers (notmuch_message_file_t *message, ...);\r
+-\r
+-/* Identical to notmuch_message_restrict_headers but accepting a va_list. */\r
+-void\r
+-notmuch_message_file_restrict_headersv (notmuch_message_file_t *message,\r
+- va_list va_headers);\r
++notmuch_status_t\r
++_notmuch_message_file_get_mime_message (notmuch_message_file_t *message,\r
++ GMimeMessage **mime_message);\r
+ \r
+ /* Get the value of the specified header from the message as a UTF-8 string.\r
+ *\r
++ * The message file is parsed as necessary.\r
++ *\r
+ * The header name is case insensitive.\r
+ *\r
+ * The Received: header is special - for it all Received: headers in\r
+@@ -387,13 +382,19 @@ notmuch_message_file_restrict_headersv (notmuch_message_file_t *message,\r
+ * only until the message is closed. The caller should copy it if\r
+ * needing to modify the value or to hold onto it for longer.\r
+ *\r
+- * Returns NULL if the message does not contain a header line matching\r
+- * 'header'.\r
++ * Returns NULL on errors, empty string if the message does not\r
++ * contain a header line matching 'header'.\r
+ */\r
+ const char *\r
+ notmuch_message_file_get_header (notmuch_message_file_t *message,\r
+ const char *header);\r
+ \r
++/* index.cc */\r
++\r
++notmuch_status_t\r
++_notmuch_message_index_file (notmuch_message_t *message,\r
++ notmuch_message_file_t *message_file);\r
++\r
+ /* messages.c */\r
+ \r
+ typedef struct _notmuch_message_node {\r
+-- \r
+1.9.0\r
+\r