From: Austin Clements Date: Sat, 29 Mar 2014 22:11:27 +0000 (+2000) Subject: Re: [PATCH v4] lib: replace the header parser with gmime X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=6a494cac90ad4bce48b5c3e254633103529e5296;p=notmuch-archives.git Re: [PATCH v4] lib: replace the header parser with gmime --- diff --git a/f1/c1267f2c1a0a83c5911d1a00063c9a593f2855 b/f1/c1267f2c1a0a83c5911d1a00063c9a593f2855 new file mode 100644 index 000000000..2a586a376 --- /dev/null +++ b/f1/c1267f2c1a0a83c5911d1a00063c9a593f2855 @@ -0,0 +1,934 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 155FD431FBD + for ; Sat, 29 Mar 2014 15:11:42 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -0.7 +X-Spam-Level: +X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id aeOL7ApPqT40 for ; + Sat, 29 Mar 2014 15:11:34 -0700 (PDT) +Received: from dmz-mailsec-scanner-4.mit.edu (dmz-mailsec-scanner-4.mit.edu + [18.9.25.15]) + (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id 0C884431FBC + for ; Sat, 29 Mar 2014 15:11:33 -0700 (PDT) +X-AuditID: 1209190f-f790b6d000000c3a-ae-5337451385d2 +Received: from mailhub-auth-2.mit.edu ( [18.7.62.36]) + (using TLS with cipher AES256-SHA (256/256 bits)) + (Client did not present a certificate) + by dmz-mailsec-scanner-4.mit.edu (Symantec Messaging Gateway) with SMTP + id 2D.CC.03130.31547335; Sat, 29 Mar 2014 18:11:31 -0400 (EDT) +Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) + by mailhub-auth-2.mit.edu (8.13.8/8.9.2) with ESMTP id s2TMBU8Z019719; + Sat, 29 Mar 2014 18:11:31 -0400 +Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) + (authenticated bits=0) + (User authenticated as amdragon@ATHENA.MIT.EDU) + by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id s2TMBSWx003642 + (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); + Sat, 29 Mar 2014 18:11:29 -0400 +Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80) + (envelope-from ) + id 1WU1TL-0005dw-SW; Sat, 29 Mar 2014 18:11:27 -0400 +Date: Sat, 29 Mar 2014 18:11:27 -0400 +From: Austin Clements +To: Jani Nikula +Subject: Re: [PATCH v4] lib: replace the header parser with gmime +Message-ID: <20140329221127.GH31187@mit.edu> +References: <1395604866-19188-1-git-send-email-jani@nikula.org> +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +Content-Disposition: inline +In-Reply-To: <1395604866-19188-1-git-send-email-jani@nikula.org> +User-Agent: Mutt/1.5.21 (2010-09-15) +X-Brightmail-Tracker: + H4sIAAAAAAAAA+NgFmplleLIzCtJLcpLzFFi42IRYrdT0RV2NQ82mN5oadE03dni+s2ZzA5M + Hrfuv2b3eLbqFnMAUxSXTUpqTmZZapG+XQJXxsQ721gLjj5irPjZ1cbYwHhgDWMXIyeHhICJ + xIFjV5ghbDGJC/fWs3UxcnEICcxmkji8aBUjhLORUeLUnx3sEM5pJonmTd+ZIJwljBJT2jaz + g/SzCKhK3Fy6nBXEZhPQkNi2fznYDhEBRYnNJ/eD2cwC0hLffjczgdjCAo4Sp26dALN5BXQk + nh6cwgZiCwnYS/w6+pERIi4ocXLmExaIXi2JG/9eAtVzgM1Z/o8DJMwp4CBxbMdqsFZRARWJ + KSe3sU1gFJqFpHsWku5ZCN0LGJlXMcqm5Fbp5iZm5hSnJusWJyfm5aUW6Zro5WaW6KWmlG5i + BIU2pyT/DsZvB5UOMQpwMCrx8M7QMA8WYk0sK67MPcQoycGkJMq70QAoxJeUn1KZkVicEV9U + mpNafIhRgoNZSYR3nx5QjjclsbIqtSgfJiXNwaIkzivPoR0sJJCeWJKanZpakFoEk5Xh4FCS + 4OV1BmoULEpNT61Iy8wpQUgzcXCCDOcBGt4MUsNbXJCYW5yZDpE/xagoJc57yQkoIQCSyCjN + g+uFpZ5XjOJArwjzJoC08wDTFlz3K6DBTECD3YrMQAaXJCKkpBoY5bJT57/njbxhJfT+uGTr + Exufa+/XFc1MWhCy9E9RfNaH0x4fuCRv7Zu978cSvp+cm6/NzrW25F+isMB5H/+uE35POma4 + vrM01DxjpnxNbulRcZ0tMss2xK/jYJ25eUECp9besEP7386avSRbaW6ChqFD++lmKZYTn7b7 + xYt3bD7syywpee5OjRJLcUaioRZzUXEiAGv9IvsYAwAA +Cc: notmuch@notmuchmail.org +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sat, 29 Mar 2014 22:11:42 -0000 + +Various little comments below. Overall this is looking really good. + +Quoth Jani Nikula on Mar 23 at 10:01 pm: +> The notmuch library includes a full blown message header parser. Yet +> the same message headers are parsed by gmime during indexing. Switch +> to gmime parsing completely. +> +> These are the main changes: +> +> * Gmime stops header parsing at the first invalid header, and presumes +> the message body starts from there. The current parser is quite +> liberal in accepting broken headers. The change means we will be +> much pickier about accepting invalid messages. +> +> * The current parser converts tabs used in header folding to +> spaces. Gmime preserve the tabs. Due to a broken python library used +> in mailman, there are plenty of mailing lists that produce headers +> with tabs in header folding, and we'll see plenty of tabs. (This +> change has been mitigated in preparatory patches.) +> +> * For pure header parsing, the current parser is likely faster than +> gmime, which parses the whole message rather than just the +> headers. Since we parse the message and its headers using gmime for +> indexing anyway, this avoids and extra header parsing round when +> adding new messages. In case of duplicate messages, we'll end up +> parsing the full message although just headers would be +> sufficient. All in all this should still speed up 'notmuch new'. +> +> * Calls to notmuch_message_get_header() may be slightly slower than +> previously for headers that are not indexed in the database, due to +> parsing of the whole message. Within the notmuch code base, notmuch +> reply is the only such user. +> +> --- +> +> This is v4 of patches [1] and [2], combining them into one. Patch [3] +> is required to make all the tests pass. +> +> The implementation has been changed considerably. Notably we only +> cache the decoded headers to a hash table if and when they are +> needed. The main reasons for doing caching at all is that gmime +> provides raw headers that need decoding, and to minimize changes +> outside of message-file.c we want to own and track the allocated +> strings instead of pushing the responsibility to callers. +> +> This addresses most comments from Austin. +> +> The diff in message-file.c is rather confusing due to a mix of +> added/removed lines. Looking at the patched file is much more +> pleasant. +> +> BR, +> Jani. +> +> [1] id:bdef45dc21c777130a7ae1fa650f172d8c6eaaf4.1391456555.git.jani@nikula.org +> [2] id:31d785c4a3e4b90862a0fdc545d4e900a4c898e2.1391456555.git.jani@nikula.org +> [3] id:1395247490-19892-1-git-send-email-jani@nikula.org +> --- +> lib/database.cc | 15 +- +> lib/index.cc | 70 +-------- +> lib/message-file.c | 420 ++++++++++++++++++++------------------------------ +> lib/notmuch-private.h | 54 +++---- +> 4 files changed, 208 insertions(+), 351 deletions(-) +> +> diff --git a/lib/database.cc b/lib/database.cc +> index aef748f75b5d..4750f40cf0fb 100644 +> --- a/lib/database.cc +> +++ b/lib/database.cc +> @@ -1971,15 +1971,10 @@ notmuch_database_add_message (notmuch_database_t *notmuch, +> if (ret) +> goto DONE; +> +> - notmuch_message_file_restrict_headers (message_file, +> - "date", +> - "from", +> - "in-reply-to", +> - "message-id", +> - "references", +> - "subject", +> - "to", +> - (char *) NULL); +> + /* Parse message up front to get better error status. */ +> + ret = notmuch_message_file_parse (message_file); +> + if (ret) +> + goto DONE; +> +> try { +> /* Before we do any real work, (especially before doing a +> @@ -2066,7 +2061,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch, +> date = notmuch_message_file_get_header (message_file, "date"); +> _notmuch_message_set_header_values (message, date, from, subject); +> +> - ret = _notmuch_message_index_file (message, filename); +> + ret = _notmuch_message_index_file (message, message_file); +> if (ret) +> goto DONE; +> } else { +> diff --git a/lib/index.cc b/lib/index.cc +> index 78c18cf36d10..46a019325454 100644 +> --- a/lib/index.cc +> +++ b/lib/index.cc +> @@ -425,63 +425,15 @@ _index_mime_part (notmuch_message_t *message, +> +> notmuch_status_t +> _notmuch_message_index_file (notmuch_message_t *message, +> - const char *filename) +> + notmuch_message_file_t *message_file) +> { +> - GMimeStream *stream = NULL; +> - GMimeParser *parser = NULL; +> - GMimeMessage *mime_message = NULL; +> + GMimeMessage *mime_message; +> InternetAddressList *addresses; +> - FILE *file = NULL; +> const char *from, *subject; +> - notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; +> - static int initialized = 0; +> - char from_buf[5]; +> - bool is_mbox = false; +> - static bool mbox_warning = false; +> - +> - if (! initialized) { +> - g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS); +> - initialized = 1; +> - } +> - +> - file = fopen (filename, "r"); +> - if (! file) { +> - fprintf (stderr, "Error opening %s: %s\n", filename, strerror (errno)); +> - ret = NOTMUCH_STATUS_FILE_ERROR; +> - goto DONE; +> - } +> - +> - /* Is this mbox? */ +> - if (fread (from_buf, sizeof (from_buf), 1, file) == 1 && +> - strncmp (from_buf, "From ", 5) == 0) +> - is_mbox = true; +> - rewind (file); +> +> - /* Evil GMime steals my FILE* here so I won't fclose it. */ +> - stream = g_mime_stream_file_new (file); +> - +> - parser = g_mime_parser_new_with_stream (stream); +> - g_mime_parser_set_scan_from (parser, is_mbox); +> - +> - mime_message = g_mime_parser_construct_message (parser); +> - +> - if (is_mbox) { +> - if (!g_mime_parser_eos (parser)) { +> - /* This is a multi-message mbox. */ +> - ret = NOTMUCH_STATUS_FILE_NOT_EMAIL; +> - goto DONE; +> - } +> - /* For historical reasons, we support single-message mboxes, +> - * but this behavior is likely to change in the future, so +> - * warn. */ +> - if (!mbox_warning) { +> - mbox_warning = true; +> - fprintf (stderr, "\ +> -Warning: %s is an mbox containing a single message,\n\ +> -likely caused by misconfigured mail delivery. Support for single-message\n\ +> -mboxes is deprecated and may be removed in the future.\n", filename); +> - } +> - } +> + mime_message = notmuch_message_file_get_mime_message (message_file); +> + if (! mime_message) +> + return NOTMUCH_STATUS_FILE_NOT_EMAIL; + +This could also be because of out-of-memory. +notmuch_message_file_get_mime_message (and notmuch_message_file_parse, +in turn) should probably take a notmuch_status_t *status_ret argument +like other internal functions. + +> +> from = g_mime_message_get_sender (mime_message); +> +> @@ -502,15 +454,5 @@ mboxes is deprecated and may be removed in the future.\n", filename); +> +> _index_mime_part (message, g_mime_message_get_mime_part (mime_message)); +> +> - DONE: +> - if (mime_message) +> - g_object_unref (mime_message); +> - +> - if (parser) +> - g_object_unref (parser); +> - +> - if (stream) +> - g_object_unref (stream); +> - +> - return ret; +> + return NOTMUCH_STATUS_SUCCESS; +> } +> diff --git a/lib/message-file.c b/lib/message-file.c +> index a2850c278b5a..88662608d319 100644 +> --- a/lib/message-file.c +> +++ b/lib/message-file.c +> @@ -26,30 +26,15 @@ +> +> #include /* GHashTable */ +> +> -typedef struct { +> - char *str; +> - size_t size; +> - size_t len; +> -} header_value_closure_t; +> - +> struct _notmuch_message_file { +> /* File object */ +> FILE *file; +> + char *filename; +> +> - /* Header storage */ +> - int restrict_headers; +> + /* Cache for decoded headers */ +> GHashTable *headers; +> - int broken_headers; +> - int good_headers; +> - size_t header_size; /* Length of full message header in bytes. */ +> - +> - /* Parsing state */ +> - char *line; +> - size_t line_size; +> - header_value_closure_t value; +> +> - int parsing_started; +> - int parsing_finished; +> + GMimeMessage *message; +> }; +> +> static int +> @@ -76,15 +61,12 @@ strcase_hash (const void *ptr) +> static int +> _notmuch_message_file_destructor (notmuch_message_file_t *message) +> { +> - if (message->line) +> - free (message->line); +> - +> - if (message->value.size) +> - free (message->value.str); +> - +> if (message->headers) +> g_hash_table_destroy (message->headers); +> +> + if (message->message) +> + g_object_unref (message->message); +> + +> if (message->file) +> fclose (message->file); +> +> @@ -102,20 +84,17 @@ _notmuch_message_file_open_ctx (void *ctx, const char *filename) +> if (unlikely (message == NULL)) +> return NULL; +> +> + /* Only needed for error messages during parsing. */ +> + message->filename = talloc_strdup (message, filename); +> + if (message->filename == NULL) +> + goto FAIL; +> + +> talloc_set_destructor (message, _notmuch_message_file_destructor); +> +> message->file = fopen (filename, "r"); +> if (message->file == NULL) +> goto FAIL; +> +> - message->headers = g_hash_table_new_full (strcase_hash, +> - strcase_equal, +> - free, +> - g_free); +> - +> - message->parsing_started = 0; +> - message->parsing_finished = 0; +> - +> return message; +> +> FAIL: +> @@ -137,264 +116,205 @@ notmuch_message_file_close (notmuch_message_file_t *message) +> talloc_free (message); +> } +> +> -void +> -notmuch_message_file_restrict_headersv (notmuch_message_file_t *message, +> - va_list va_headers) +> +notmuch_status_t +> +notmuch_message_file_parse (notmuch_message_file_t *message) +> { +> - char *header; +> + GMimeStream *stream; +> + GMimeParser *parser; +> + notmuch_status_t status = NOTMUCH_STATUS_SUCCESS; +> + static int initialized = 0; +> + char from_buf[5]; +> + notmuch_bool_t is_mbox = FALSE; +> + static notmuch_bool_t mbox_warning = FALSE; +> +> - if (message->parsing_started) +> - INTERNAL_ERROR ("notmuch_message_file_restrict_headers called after parsing has started"); +> + if (message->message) +> + return NOTMUCH_STATUS_SUCCESS; +> +> - while (1) { +> - header = va_arg (va_headers, char*); +> - if (header == NULL) +> - break; +> - g_hash_table_insert (message->headers, +> - xstrdup (header), NULL); +> + if (! initialized) { +> + g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS); +> + initialized = 1; +> } +> +> - message->restrict_headers = 1; +> -} +> - +> -void +> -notmuch_message_file_restrict_headers (notmuch_message_file_t *message, ...) +> -{ +> - va_list va_headers; +> - +> - va_start (va_headers, message); +> + message->headers = g_hash_table_new_full (strcase_hash, strcase_equal, +> + free, g_free); +> + if (! message->headers) +> + return NOTMUCH_STATUS_OUT_OF_MEMORY; +> +> - notmuch_message_file_restrict_headersv (message, va_headers); +> -} +> + /* Is this mbox? */ +> + if (fread (from_buf, sizeof (from_buf), 1, message->file) == 1 && +> + strncmp (from_buf, "From ", 5) == 0) +> + is_mbox = TRUE; +> + rewind (message->file); +> +> -static void +> -copy_header_unfolding (header_value_closure_t *value, +> - const char *chunk) +> -{ +> - char *last; +> + stream = g_mime_stream_file_new (message->file); +> +> - if (chunk == NULL) +> - return; +> + /* We'll own and fclose the FILE* ourselves. */ +> + g_mime_stream_file_set_owner (GMIME_STREAM_FILE (stream), FALSE); +> +> - while (*chunk == ' ' || *chunk == '\t') +> - chunk++; +> + parser = g_mime_parser_new_with_stream (stream); +> + g_mime_parser_set_scan_from (parser, is_mbox); +> +> - if (value->len + 1 + strlen (chunk) + 1 > value->size) { +> - unsigned int new_size = value->size; +> - if (value->size == 0) +> - new_size = strlen (chunk) + 1; +> - else +> - while (value->len + 1 + strlen (chunk) + 1 > new_size) +> - new_size *= 2; +> - value->str = xrealloc (value->str, new_size); +> - value->size = new_size; +> + message->message = g_mime_parser_construct_message (parser); +> + if (! message->message) { +> + status = NOTMUCH_STATUS_FILE_NOT_EMAIL; +> + goto DONE; +> } +> +> - last = value->str + value->len; +> - if (value->len) { +> - *last = ' '; +> - last++; +> - value->len++; +> + if (is_mbox) { +> + if (! g_mime_parser_eos (parser)) { +> + /* This is a multi-message mbox. */ +> + status = NOTMUCH_STATUS_FILE_NOT_EMAIL; +> + goto DONE; +> + } +> + /* +> + * For historical reasons, we support single-message mboxes, +> + * but this behavior is likely to change in the future, so +> + * warn. +> + */ +> + if (! mbox_warning) { +> + mbox_warning = TRUE; +> + fprintf (stderr, "\ +> +Warning: %s is an mbox containing a single message,\n\ +> +likely caused by misconfigured mail delivery. Support for single-message\n\ +> +mboxes is deprecated and may be removed in the future.\n", message->filename); + +I still wonder if this is the right place to issue this warning. For +example, doesn't this mean that, when constructing a reply to a single +message mbox, we'll issue this warning? + +Or maybe it's time to drop single message mbox support. We deprecated +it in 0.15.1, which was over a year ago. OTOH, wheezy still has +0.13.2, so there could be people still using mboxes without the +warning. OTOOH, we only kept this around to appear David anyway. So +maybe the answer is to leave this in an not think too hard about it. + +> + } +> } +> +> - strcpy (last, chunk); +> - value->len += strlen (chunk); +> + DONE: +> + g_object_unref (stream); +> + g_object_unref (parser); +> + +> + if (status) { +> + g_hash_table_destroy (message->headers); +> + message->headers = NULL; +> +> - last = value->str + value->len - 1; +> - if (*last == '\n') { +> - *last = '\0'; +> - value->len--; +> + if (message->message) { +> + g_object_unref (message->message); +> + message->message = NULL; +> + } +> + +> + rewind (message->file); +> } +> + +> + return status; +> } +> +> -/* As a special-case, a value of NULL for header_desired will force +> - * the entire header to be parsed if it is not parsed already. This is +> - * used by the _notmuch_message_file_get_headers_end function. +> - * Another special case is the Received: header. For this header we +> - * want to concatenate all instances of the header instead of just +> - * hashing the first instance as we use this when analyzing the path +> - * the mail has taken from sender to recipient. +> - */ +> -const char * +> -notmuch_message_file_get_header (notmuch_message_file_t *message, +> - const char *header_desired) +> +GMimeMessage * +> +notmuch_message_file_get_mime_message (notmuch_message_file_t *message) +> { +> - int contains; +> - char *header, *decoded_value, *header_sofar, *combined_header; +> - const char *s, *colon; +> - int match, newhdr, hdrsofar, is_received; +> - static int initialized = 0; +> + if (notmuch_message_file_parse (message)) +> + return NULL; +> +> - is_received = (strcmp(header_desired,"received") == 0); +> + return message->message; +> +} +> +> - if (! initialized) { +> - g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS); +> - initialized = 1; +> - } +> +/* +> + * Get all instances of a header decoded and concatenated. +> + * +> + * The result must be freed using g_free(). +> + * +> + * Return NULL on errors, empty string for non-existing headers. +> + */ +> +static char * +> +_notmuch_message_file_get_combined_header (notmuch_message_file_t *message, +> + const char *header) +> +{ +> + GMimeHeaderList *headers; +> + GMimeHeaderIter *iter; +> + char *combined = NULL; +> +> - message->parsing_started = 1; +> - +> - if (header_desired == NULL) +> - contains = 0; +> - else +> - contains = g_hash_table_lookup_extended (message->headers, +> - header_desired, NULL, +> - (gpointer *) &decoded_value); +> - +> - if (contains && decoded_value) +> - return decoded_value; +> - +> - if (message->parsing_finished) +> - return ""; +> - +> -#define NEXT_HEADER_LINE(closure) \ +> - while (1) { \ +> - ssize_t bytes_read = getline (&message->line, \ +> - &message->line_size, \ +> - message->file); \ +> - if (bytes_read == -1) { \ +> - message->parsing_finished = 1; \ +> - break; \ +> - } \ +> - if (*message->line == '\n') { \ +> - message->parsing_finished = 1; \ +> - break; \ +> - } \ +> - if (closure && \ +> - (*message->line == ' ' || *message->line == '\t')) \ +> - { \ +> - copy_header_unfolding ((closure), message->line); \ +> - } \ +> - if (*message->line == ' ' || *message->line == '\t') \ +> - message->header_size += strlen (message->line); \ +> - else \ +> - break; \ +> - } +> + headers = g_mime_object_get_header_list (GMIME_OBJECT (message->message)); +> + if (! headers) +> + return NULL; +> +> - if (message->line == NULL) +> - NEXT_HEADER_LINE (NULL); +> + iter = g_mime_header_iter_new (); +> + if (! iter) +> + return NULL; +> +> - while (1) { +> + if (! g_mime_header_list_get_iter (headers, iter)) +> + goto DONE; +> +> - if (message->parsing_finished) +> - break; +> + do { +> + const char *value; + +Incorrect indentation. + +> + char *decoded; +> +> - colon = strchr (message->line, ':'); +> + if (strcasecmp (g_mime_header_iter_get_name (iter), header) != 0) +> + continue; +> +> - if (colon == NULL) { +> - message->broken_headers++; +> - /* A simple heuristic for giving up on things that just +> - * don't look like mail messages. */ +> - if (message->broken_headers >= 10 && +> - message->good_headers < 5) +> - { +> - message->parsing_finished = 1; +> - break; +> + value = g_mime_header_iter_get_value (iter); +> + decoded = g_mime_utils_header_decode_text (value); + +Maybe a comment saying that GMime retains ownership of value, but +decoded must be freed with g_free? GMime is confusing about +ownership, and the fact that value *doesn't* have to be freed (which +is not documented by GMime!) just adds to the confusion. I could see +this being easy to screw up during later code changes. + +> + if (! decoded) { +> + if (combined) { +> + g_free (combined); +> + combined = NULL; +> } +> - NEXT_HEADER_LINE (NULL); +> - continue; +> + goto DONE; +> } +> +> - message->header_size += strlen (message->line); +> + if (combined) { +> + char *tmp = g_strdup_printf ("%s %s", combined, decoded); +> + g_free (decoded); +> + g_free (combined); +> + if (! tmp) { +> + combined = NULL; +> + goto DONE; +> + } +> +> - message->good_headers++; +> + combined = tmp; +> + } else { +> + combined = decoded; +> + } +> + } while (g_mime_header_iter_next (iter)); + +Yet another funny GMime API... Did you consider + +for (; g_mime_header_iter_is_valid (iter); g_mime_header_iter_next (iter)) + +? I realize that the g_mime_header_iter_is_valid isn't actually +necessary because g_mime_header_list_get_iter returns NULL if the +iterator would be empty, but using a for-loop would make it more +obvious that this code is correct. + +> +> - header = xstrndup (message->line, colon - message->line); +> + /* Return empty string for non-existing headers. */ +> + if (! combined) +> + combined = g_strdup (""); +> +> - if (message->restrict_headers && +> - ! g_hash_table_lookup_extended (message->headers, +> - header, NULL, NULL)) +> - { +> - free (header); +> - NEXT_HEADER_LINE (NULL); +> - continue; +> - } +> + DONE: +> + g_mime_header_iter_free (iter); +> +> - s = colon + 1; +> - while (*s == ' ' || *s == '\t') +> - s++; +> + return combined; +> +} + +Oof. That received header had better be worth it! + +> +> - message->value.len = 0; +> - copy_header_unfolding (&message->value, s); +> +/* Return NULL on errors, empty string for non-existing headers. */ + +This may sound strange, but I'd be inclined to omit this comment, +since it made me not realize I should go looking for the much more +thorough comment in notmuch-private.h. + +> +const char * +> +notmuch_message_file_get_header (notmuch_message_file_t *message, +> + const char *header) +> +{ +> + const char *value = NULL; +> + char *decoded; +> + notmuch_bool_t found; +> +> - NEXT_HEADER_LINE (&message->value); +> + if (notmuch_message_file_parse (message)) +> + return NULL; +> +> - if (header_desired == NULL) +> - match = 0; +> + /* If we have a cached decoded value, use it. */ +> + found = g_hash_table_lookup_extended (message->headers, header, +> + NULL, (gpointer *) &value); +> + if (found) +> + return value ? value : ""; + +When would header be found, but value be NULL? (If this can't in fact +happen, how about using g_hash_table_lookup?) + +> + +> + if (strcasecmp (header, "received") == 0) { +> + /* +> + * The Received: header is special. We concatenate all +> + * instances of the header as we use this when analyzing the +> + * path the mail has taken from sender to recipient. +> + */ +> + decoded = _notmuch_message_file_get_combined_header (message, header); +> + } else { +> + value = g_mime_object_get_header (GMIME_OBJECT (message->message), +> + header); +> + if (value) +> + decoded = g_mime_utils_header_decode_text (value); +> else +> - match = (strcasecmp (header, header_desired) == 0); +> - +> - decoded_value = g_mime_utils_header_decode_text (message->value.str); +> - header_sofar = (char *)g_hash_table_lookup (message->headers, header); +> - /* we treat the Received: header special - we want to concat ALL of +> - * the Received: headers we encounter. +> - * for everything else we return the first instance of a header */ +> - if (strcasecmp(header, "received") == 0) { +> - if (header_sofar == NULL) { +> - /* first Received: header we encountered; just add it */ +> - g_hash_table_insert (message->headers, header, decoded_value); +> - } else { +> - /* we need to add the header to those we already collected */ +> - newhdr = strlen(decoded_value); +> - hdrsofar = strlen(header_sofar); +> - combined_header = g_malloc(hdrsofar + newhdr + 2); +> - strncpy(combined_header,header_sofar,hdrsofar); +> - *(combined_header+hdrsofar) = ' '; +> - strncpy(combined_header+hdrsofar+1,decoded_value,newhdr+1); +> - g_free (decoded_value); +> - g_hash_table_insert (message->headers, header, combined_header); +> - } +> - } else { +> - if (header_sofar == NULL) { +> - /* Only insert if we don't have a value for this header, yet. */ +> - g_hash_table_insert (message->headers, header, decoded_value); +> - } else { +> - free (header); +> - g_free (decoded_value); +> - decoded_value = header_sofar; +> - } +> - } +> - /* if we found a match we can bail - unless of course we are +> - * collecting all the Received: headers */ +> - if (match && !is_received) +> - return decoded_value; +> - } +> - +> - if (message->parsing_finished) { +> - fclose (message->file); +> - message->file = NULL; +> + decoded = g_strdup (""); +> } +> +> - if (message->line) +> - free (message->line); +> - message->line = NULL; +> - +> - if (message->value.size) { +> - free (message->value.str); +> - message->value.str = NULL; +> - message->value.size = 0; +> - message->value.len = 0; +> - } +> + if (! decoded) +> + return NULL; +> +> - /* For the Received: header we actually might end up here even +> - * though we found the header (as we force continued parsing +> - * in that case). So let's check if that's the header we were +> - * looking for and return the value that we found (if any) +> - */ +> - if (is_received) +> - return (char *)g_hash_table_lookup (message->headers, "received"); +> - +> - /* We've parsed all headers and never found the one we're looking +> - * for. It's probably just not there, but let's check that we +> - * didn't make a mistake preventing us from seeing it. */ +> - if (message->restrict_headers && header_desired && +> - ! g_hash_table_lookup_extended (message->headers, +> - header_desired, NULL, NULL)) +> - { +> - INTERNAL_ERROR ("Attempt to get header \"%s\" which was not\n" +> - "included in call to notmuch_message_file_restrict_headers\n", +> - header_desired); +> - } +> + /* Cache the decoded value. We also own the strings. */ +> + g_hash_table_insert (message->headers, xstrdup (header), decoded); +> +> - return ""; +> + return decoded; +> } +> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h +> index 59eb2bc285a5..734a4e338554 100644 +> --- a/lib/notmuch-private.h +> +++ b/lib/notmuch-private.h +> @@ -46,6 +46,8 @@ NOTMUCH_BEGIN_DECLS +> +> #include +> +> +#include +> + +> #include "xutil.h" +> #include "error_util.h" +> +> @@ -320,13 +322,6 @@ notmuch_message_set_author (notmuch_message_t *message, const char *author); +> const char * +> notmuch_message_get_author (notmuch_message_t *message); +> +> - +> -/* index.cc */ +> - +> -notmuch_status_t +> -_notmuch_message_index_file (notmuch_message_t *message, +> - const char *filename); +> - +> /* message-file.c */ +> +> /* XXX: I haven't decided yet whether these will actually get exported +> @@ -352,32 +347,31 @@ _notmuch_message_file_open_ctx (void *ctx, const char *filename); +> void +> notmuch_message_file_close (notmuch_message_file_t *message); +> +> -/* Restrict 'message' to only save the named headers. +> +/* Parse the message. +> * +> - * When the caller is only interested in a short list of headers, +> - * known in advance, calling this function can avoid wasted time and +> - * memory parsing/saving header values that will never be needed. +> + * This will be done automatically as necessary on other calls +> + * depending on it, but an explicit call allows for better error +> + * status reporting. +> + */ +> +notmuch_status_t +> +notmuch_message_file_parse (notmuch_message_file_t *message); + +Most internal functions have a leading _. (We've diverged from that +in a few places, especially in notmuch_message_file_*, but it would be +nice to not diverge further from that convention.) + +> + +> +/* Get the gmime message of a message file. +> * +> - * The variable arguments should be a list of const char * with a +> - * final '(const char *) NULL' to terminate the list. +> + * The message file is parsed as necessary. +> * +> - * If this function is called, it must be called before any calls to +> - * notmuch_message_get_header for this message. +> + * Returns GMimeMessage* on success (which the caller must not unref), +> + * NULL if the message file parsing fails. +> * +> - * After calling this function, if notmuch_message_get_header is +> - * called with a header name not in this list, then NULL will be +> - * returned even if that header exists in the actual message. +> + * XXX: Would be nice to not have to expose GMimeMessage here. +> */ +> -void +> -notmuch_message_file_restrict_headers (notmuch_message_file_t *message, ...); +> - +> -/* Identical to notmuch_message_restrict_headers but accepting a va_list. */ +> -void +> -notmuch_message_file_restrict_headersv (notmuch_message_file_t *message, +> - va_list va_headers); +> +GMimeMessage * +> +notmuch_message_file_get_mime_message (notmuch_message_file_t *message); + +Same comment about _. + +> +> /* Get the value of the specified header from the message as a UTF-8 string. +> * +> + * The message file is parsed as necessary. +> + * +> * The header name is case insensitive. +> * +> * The Received: header is special - for it all Received: headers in +> @@ -387,13 +381,19 @@ notmuch_message_file_restrict_headersv (notmuch_message_file_t *message, +> * only until the message is closed. The caller should copy it if +> * needing to modify the value or to hold onto it for longer. +> * +> - * Returns NULL if the message does not contain a header line matching +> - * 'header'. +> + * Returns NULL on errors, empty string if the message does not +> + * contain a header line matching 'header'. +> */ +> const char * +> notmuch_message_file_get_header (notmuch_message_file_t *message, +> const char *header); +> +> +/* index.cc */ +> + +> +notmuch_status_t +> +_notmuch_message_index_file (notmuch_message_t *message, +> + notmuch_message_file_t *message_file); +> + + +Any particular reason this floated down? (Obviously it's not an +actual problem.) + +> /* messages.c */ +> +> typedef struct _notmuch_message_node {