From: Jani Nikula Date: Sun, 30 Mar 2014 21:10:20 +0000 (+0300) Subject: Re: [PATCH v4] lib: replace the header parser with gmime X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=aaedd54426735f05fe239e48d9ffe8cec0923887;p=notmuch-archives.git Re: [PATCH v4] lib: replace the header parser with gmime --- diff --git a/f0/36410762d64fd36e875eef8d6b2e42439d9698 b/f0/36410762d64fd36e875eef8d6b2e42439d9698 new file mode 100644 index 000000000..04a05e21b --- /dev/null +++ b/f0/36410762d64fd36e875eef8d6b2e42439d9698 @@ -0,0 +1,971 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 4316B431FBC + for ; Sun, 30 Mar 2014 14:10:38 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -0.7 +X-Spam-Level: +X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id X6JfBu+sVYdc for ; + Sun, 30 Mar 2014 14:10:30 -0700 (PDT) +Received: from mail-we0-f175.google.com (mail-we0-f175.google.com + [74.125.82.175]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id 35A3B431FB6 + for ; Sun, 30 Mar 2014 14:10:30 -0700 (PDT) +Received: by mail-we0-f175.google.com with SMTP id q58so3924177wes.20 + for ; Sun, 30 Mar 2014 14:10:26 -0700 (PDT) +X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=1e100.net; s=20130820; + h=x-gm-message-state:from:to:cc:subject:in-reply-to:references + :user-agent:date:message-id:mime-version:content-type; + bh=Kllr2l+SxkOCXuc9h87xVTzIjR//GKogXIFNhgA110I=; + b=HeJn0dTK7twgwaH3SaNRHZEauMYGUM50wNWuVa2uQu6l87mc55mFheVL6rrsfkIHVo + uIHJBHeNJPvZJFgV/iDIDS1cORFBMUQM/SDhYL/od4Pj05iF5231bYVmFiscqX3gWb79 + tmDQV+sYBJqePYjwWX3OrcByi04Wm+KF9zcFmTggM/8oY6M8B+B6ONrRp7NhRGcteX98 + wvnD58gzyf5x94qEUIjXm/uuumqX4CXVxcr3J6tsJtMMaC8udAYALpiDfSZ+bBaSzf4f + ccabJ0AgOaJimWdAxcjqPONp2bvjB4cp79pCJI3OctzD/8HgxmFP212zcmfT1xG7lVEf + Nd/A== +X-Gm-Message-State: + ALoCoQlV/gE6PtA+uBQdqTuVFLM3c2j2vbKJbv+0QHLFjiv0yCoOYlHdXC+pOdYC11LfhXW/YPZU +X-Received: by 10.194.60.37 with SMTP id e5mr9233199wjr.32.1396213825059; + Sun, 30 Mar 2014 14:10:25 -0700 (PDT) +Received: from localhost (dsl-hkibrasgw2-58c36f-91.dhcp.inet.fi. + [88.195.111.91]) + by mx.google.com with ESMTPSA id 44sm28262047eek.30.2014.03.30.14.10.22 + for + (version=TLSv1.2 cipher=RC4-SHA bits=128/128); + Sun, 30 Mar 2014 14:10:23 -0700 (PDT) +From: Jani Nikula +To: Austin Clements +Subject: Re: [PATCH v4] lib: replace the header parser with gmime +In-Reply-To: <20140329221127.GH31187@mit.edu> +References: <1395604866-19188-1-git-send-email-jani@nikula.org> + <20140329221127.GH31187@mit.edu> +User-Agent: Notmuch/0.17+164~gcd5fd5a4837e (http://notmuchmail.org) + Emacs/24.3.1 (x86_64-pc-linux-gnu) +Date: Mon, 31 Mar 2014 00:10:20 +0300 +Message-ID: <87y4zrl7er.fsf@nikula.org> +MIME-Version: 1.0 +Content-Type: text/plain +Cc: notmuch@notmuchmail.org +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sun, 30 Mar 2014 21:10:38 -0000 + +On Sun, 30 Mar 2014, Austin Clements wrote: +> Various little comments below. Overall this is looking really good. +> +> Quoth Jani Nikula on Mar 23 at 10:01 pm: +>> The notmuch library includes a full blown message header parser. Yet +>> the same message headers are parsed by gmime during indexing. Switch +>> to gmime parsing completely. +>> +>> These are the main changes: +>> +>> * Gmime stops header parsing at the first invalid header, and presumes +>> the message body starts from there. The current parser is quite +>> liberal in accepting broken headers. The change means we will be +>> much pickier about accepting invalid messages. +>> +>> * The current parser converts tabs used in header folding to +>> spaces. Gmime preserve the tabs. Due to a broken python library used +>> in mailman, there are plenty of mailing lists that produce headers +>> with tabs in header folding, and we'll see plenty of tabs. (This +>> change has been mitigated in preparatory patches.) +>> +>> * For pure header parsing, the current parser is likely faster than +>> gmime, which parses the whole message rather than just the +>> headers. Since we parse the message and its headers using gmime for +>> indexing anyway, this avoids and extra header parsing round when +>> adding new messages. In case of duplicate messages, we'll end up +>> parsing the full message although just headers would be +>> sufficient. All in all this should still speed up 'notmuch new'. +>> +>> * Calls to notmuch_message_get_header() may be slightly slower than +>> previously for headers that are not indexed in the database, due to +>> parsing of the whole message. Within the notmuch code base, notmuch +>> reply is the only such user. +>> +>> --- +>> +>> This is v4 of patches [1] and [2], combining them into one. Patch [3] +>> is required to make all the tests pass. +>> +>> The implementation has been changed considerably. Notably we only +>> cache the decoded headers to a hash table if and when they are +>> needed. The main reasons for doing caching at all is that gmime +>> provides raw headers that need decoding, and to minimize changes +>> outside of message-file.c we want to own and track the allocated +>> strings instead of pushing the responsibility to callers. +>> +>> This addresses most comments from Austin. +>> +>> The diff in message-file.c is rather confusing due to a mix of +>> added/removed lines. Looking at the patched file is much more +>> pleasant. +>> +>> BR, +>> Jani. +>> +>> [1] id:bdef45dc21c777130a7ae1fa650f172d8c6eaaf4.1391456555.git.jani@nikula.org +>> [2] id:31d785c4a3e4b90862a0fdc545d4e900a4c898e2.1391456555.git.jani@nikula.org +>> [3] id:1395247490-19892-1-git-send-email-jani@nikula.org +>> --- +>> lib/database.cc | 15 +- +>> lib/index.cc | 70 +-------- +>> lib/message-file.c | 420 ++++++++++++++++++++------------------------------ +>> lib/notmuch-private.h | 54 +++---- +>> 4 files changed, 208 insertions(+), 351 deletions(-) +>> +>> diff --git a/lib/database.cc b/lib/database.cc +>> index aef748f75b5d..4750f40cf0fb 100644 +>> --- a/lib/database.cc +>> +++ b/lib/database.cc +>> @@ -1971,15 +1971,10 @@ notmuch_database_add_message (notmuch_database_t *notmuch, +>> if (ret) +>> goto DONE; +>> +>> - notmuch_message_file_restrict_headers (message_file, +>> - "date", +>> - "from", +>> - "in-reply-to", +>> - "message-id", +>> - "references", +>> - "subject", +>> - "to", +>> - (char *) NULL); +>> + /* Parse message up front to get better error status. */ +>> + ret = notmuch_message_file_parse (message_file); +>> + if (ret) +>> + goto DONE; +>> +>> try { +>> /* Before we do any real work, (especially before doing a +>> @@ -2066,7 +2061,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch, +>> date = notmuch_message_file_get_header (message_file, "date"); +>> _notmuch_message_set_header_values (message, date, from, subject); +>> +>> - ret = _notmuch_message_index_file (message, filename); +>> + ret = _notmuch_message_index_file (message, message_file); +>> if (ret) +>> goto DONE; +>> } else { +>> diff --git a/lib/index.cc b/lib/index.cc +>> index 78c18cf36d10..46a019325454 100644 +>> --- a/lib/index.cc +>> +++ b/lib/index.cc +>> @@ -425,63 +425,15 @@ _index_mime_part (notmuch_message_t *message, +>> +>> notmuch_status_t +>> _notmuch_message_index_file (notmuch_message_t *message, +>> - const char *filename) +>> + notmuch_message_file_t *message_file) +>> { +>> - GMimeStream *stream = NULL; +>> - GMimeParser *parser = NULL; +>> - GMimeMessage *mime_message = NULL; +>> + GMimeMessage *mime_message; +>> InternetAddressList *addresses; +>> - FILE *file = NULL; +>> const char *from, *subject; +>> - notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; +>> - static int initialized = 0; +>> - char from_buf[5]; +>> - bool is_mbox = false; +>> - static bool mbox_warning = false; +>> - +>> - if (! initialized) { +>> - g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS); +>> - initialized = 1; +>> - } +>> - +>> - file = fopen (filename, "r"); +>> - if (! file) { +>> - fprintf (stderr, "Error opening %s: %s\n", filename, strerror (errno)); +>> - ret = NOTMUCH_STATUS_FILE_ERROR; +>> - goto DONE; +>> - } +>> - +>> - /* Is this mbox? */ +>> - if (fread (from_buf, sizeof (from_buf), 1, file) == 1 && +>> - strncmp (from_buf, "From ", 5) == 0) +>> - is_mbox = true; +>> - rewind (file); +>> +>> - /* Evil GMime steals my FILE* here so I won't fclose it. */ +>> - stream = g_mime_stream_file_new (file); +>> - +>> - parser = g_mime_parser_new_with_stream (stream); +>> - g_mime_parser_set_scan_from (parser, is_mbox); +>> - +>> - mime_message = g_mime_parser_construct_message (parser); +>> - +>> - if (is_mbox) { +>> - if (!g_mime_parser_eos (parser)) { +>> - /* This is a multi-message mbox. */ +>> - ret = NOTMUCH_STATUS_FILE_NOT_EMAIL; +>> - goto DONE; +>> - } +>> - /* For historical reasons, we support single-message mboxes, +>> - * but this behavior is likely to change in the future, so +>> - * warn. */ +>> - if (!mbox_warning) { +>> - mbox_warning = true; +>> - fprintf (stderr, "\ +>> -Warning: %s is an mbox containing a single message,\n\ +>> -likely caused by misconfigured mail delivery. Support for single-message\n\ +>> -mboxes is deprecated and may be removed in the future.\n", filename); +>> - } +>> - } +>> + mime_message = notmuch_message_file_get_mime_message (message_file); +>> + if (! mime_message) +>> + return NOTMUCH_STATUS_FILE_NOT_EMAIL; +> +> This could also be because of out-of-memory. +> notmuch_message_file_get_mime_message (and notmuch_message_file_parse, +> in turn) should probably take a notmuch_status_t *status_ret argument +> like other internal functions. + +Agreed (though I opted for returning notmuch_status_t and passing +GMimeMessage** as parameter). + +> +>> +>> from = g_mime_message_get_sender (mime_message); +>> +>> @@ -502,15 +454,5 @@ mboxes is deprecated and may be removed in the future.\n", filename); +>> +>> _index_mime_part (message, g_mime_message_get_mime_part (mime_message)); +>> +>> - DONE: +>> - if (mime_message) +>> - g_object_unref (mime_message); +>> - +>> - if (parser) +>> - g_object_unref (parser); +>> - +>> - if (stream) +>> - g_object_unref (stream); +>> - +>> - return ret; +>> + return NOTMUCH_STATUS_SUCCESS; +>> } +>> diff --git a/lib/message-file.c b/lib/message-file.c +>> index a2850c278b5a..88662608d319 100644 +>> --- a/lib/message-file.c +>> +++ b/lib/message-file.c +>> @@ -26,30 +26,15 @@ +>> +>> #include /* GHashTable */ +>> +>> -typedef struct { +>> - char *str; +>> - size_t size; +>> - size_t len; +>> -} header_value_closure_t; +>> - +>> struct _notmuch_message_file { +>> /* File object */ +>> FILE *file; +>> + char *filename; +>> +>> - /* Header storage */ +>> - int restrict_headers; +>> + /* Cache for decoded headers */ +>> GHashTable *headers; +>> - int broken_headers; +>> - int good_headers; +>> - size_t header_size; /* Length of full message header in bytes. */ +>> - +>> - /* Parsing state */ +>> - char *line; +>> - size_t line_size; +>> - header_value_closure_t value; +>> +>> - int parsing_started; +>> - int parsing_finished; +>> + GMimeMessage *message; +>> }; +>> +>> static int +>> @@ -76,15 +61,12 @@ strcase_hash (const void *ptr) +>> static int +>> _notmuch_message_file_destructor (notmuch_message_file_t *message) +>> { +>> - if (message->line) +>> - free (message->line); +>> - +>> - if (message->value.size) +>> - free (message->value.str); +>> - +>> if (message->headers) +>> g_hash_table_destroy (message->headers); +>> +>> + if (message->message) +>> + g_object_unref (message->message); +>> + +>> if (message->file) +>> fclose (message->file); +>> +>> @@ -102,20 +84,17 @@ _notmuch_message_file_open_ctx (void *ctx, const char *filename) +>> if (unlikely (message == NULL)) +>> return NULL; +>> +>> + /* Only needed for error messages during parsing. */ +>> + message->filename = talloc_strdup (message, filename); +>> + if (message->filename == NULL) +>> + goto FAIL; +>> + +>> talloc_set_destructor (message, _notmuch_message_file_destructor); +>> +>> message->file = fopen (filename, "r"); +>> if (message->file == NULL) +>> goto FAIL; +>> +>> - message->headers = g_hash_table_new_full (strcase_hash, +>> - strcase_equal, +>> - free, +>> - g_free); +>> - +>> - message->parsing_started = 0; +>> - message->parsing_finished = 0; +>> - +>> return message; +>> +>> FAIL: +>> @@ -137,264 +116,205 @@ notmuch_message_file_close (notmuch_message_file_t *message) +>> talloc_free (message); +>> } +>> +>> -void +>> -notmuch_message_file_restrict_headersv (notmuch_message_file_t *message, +>> - va_list va_headers) +>> +notmuch_status_t +>> +notmuch_message_file_parse (notmuch_message_file_t *message) +>> { +>> - char *header; +>> + GMimeStream *stream; +>> + GMimeParser *parser; +>> + notmuch_status_t status = NOTMUCH_STATUS_SUCCESS; +>> + static int initialized = 0; +>> + char from_buf[5]; +>> + notmuch_bool_t is_mbox = FALSE; +>> + static notmuch_bool_t mbox_warning = FALSE; +>> +>> - if (message->parsing_started) +>> - INTERNAL_ERROR ("notmuch_message_file_restrict_headers called after parsing has started"); +>> + if (message->message) +>> + return NOTMUCH_STATUS_SUCCESS; +>> +>> - while (1) { +>> - header = va_arg (va_headers, char*); +>> - if (header == NULL) +>> - break; +>> - g_hash_table_insert (message->headers, +>> - xstrdup (header), NULL); +>> + if (! initialized) { +>> + g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS); +>> + initialized = 1; +>> } +>> +>> - message->restrict_headers = 1; +>> -} +>> - +>> -void +>> -notmuch_message_file_restrict_headers (notmuch_message_file_t *message, ...) +>> -{ +>> - va_list va_headers; +>> - +>> - va_start (va_headers, message); +>> + message->headers = g_hash_table_new_full (strcase_hash, strcase_equal, +>> + free, g_free); +>> + if (! message->headers) +>> + return NOTMUCH_STATUS_OUT_OF_MEMORY; +>> +>> - notmuch_message_file_restrict_headersv (message, va_headers); +>> -} +>> + /* Is this mbox? */ +>> + if (fread (from_buf, sizeof (from_buf), 1, message->file) == 1 && +>> + strncmp (from_buf, "From ", 5) == 0) +>> + is_mbox = TRUE; +>> + rewind (message->file); +>> +>> -static void +>> -copy_header_unfolding (header_value_closure_t *value, +>> - const char *chunk) +>> -{ +>> - char *last; +>> + stream = g_mime_stream_file_new (message->file); +>> +>> - if (chunk == NULL) +>> - return; +>> + /* We'll own and fclose the FILE* ourselves. */ +>> + g_mime_stream_file_set_owner (GMIME_STREAM_FILE (stream), FALSE); +>> +>> - while (*chunk == ' ' || *chunk == '\t') +>> - chunk++; +>> + parser = g_mime_parser_new_with_stream (stream); +>> + g_mime_parser_set_scan_from (parser, is_mbox); +>> +>> - if (value->len + 1 + strlen (chunk) + 1 > value->size) { +>> - unsigned int new_size = value->size; +>> - if (value->size == 0) +>> - new_size = strlen (chunk) + 1; +>> - else +>> - while (value->len + 1 + strlen (chunk) + 1 > new_size) +>> - new_size *= 2; +>> - value->str = xrealloc (value->str, new_size); +>> - value->size = new_size; +>> + message->message = g_mime_parser_construct_message (parser); +>> + if (! message->message) { +>> + status = NOTMUCH_STATUS_FILE_NOT_EMAIL; +>> + goto DONE; +>> } +>> +>> - last = value->str + value->len; +>> - if (value->len) { +>> - *last = ' '; +>> - last++; +>> - value->len++; +>> + if (is_mbox) { +>> + if (! g_mime_parser_eos (parser)) { +>> + /* This is a multi-message mbox. */ +>> + status = NOTMUCH_STATUS_FILE_NOT_EMAIL; +>> + goto DONE; +>> + } +>> + /* +>> + * For historical reasons, we support single-message mboxes, +>> + * but this behavior is likely to change in the future, so +>> + * warn. +>> + */ +>> + if (! mbox_warning) { +>> + mbox_warning = TRUE; +>> + fprintf (stderr, "\ +>> +Warning: %s is an mbox containing a single message,\n\ +>> +likely caused by misconfigured mail delivery. Support for single-message\n\ +>> +mboxes is deprecated and may be removed in the future.\n", message->filename); +> +> I still wonder if this is the right place to issue this warning. For +> example, doesn't this mean that, when constructing a reply to a single +> message mbox, we'll issue this warning? +> +> Or maybe it's time to drop single message mbox support. We deprecated +> it in 0.15.1, which was over a year ago. OTOH, wheezy still has +> 0.13.2, so there could be people still using mboxes without the +> warning. OTOOH, we only kept this around to appear David anyway. So +> maybe the answer is to leave this in an not think too hard about it. + +I'm adding a prep patch to drop mbox support altogether. + +> +>> + } +>> } +>> +>> - strcpy (last, chunk); +>> - value->len += strlen (chunk); +>> + DONE: +>> + g_object_unref (stream); +>> + g_object_unref (parser); +>> + +>> + if (status) { +>> + g_hash_table_destroy (message->headers); +>> + message->headers = NULL; +>> +>> - last = value->str + value->len - 1; +>> - if (*last == '\n') { +>> - *last = '\0'; +>> - value->len--; +>> + if (message->message) { +>> + g_object_unref (message->message); +>> + message->message = NULL; +>> + } +>> + +>> + rewind (message->file); +>> } +>> + +>> + return status; +>> } +>> +>> -/* As a special-case, a value of NULL for header_desired will force +>> - * the entire header to be parsed if it is not parsed already. This is +>> - * used by the _notmuch_message_file_get_headers_end function. +>> - * Another special case is the Received: header. For this header we +>> - * want to concatenate all instances of the header instead of just +>> - * hashing the first instance as we use this when analyzing the path +>> - * the mail has taken from sender to recipient. +>> - */ +>> -const char * +>> -notmuch_message_file_get_header (notmuch_message_file_t *message, +>> - const char *header_desired) +>> +GMimeMessage * +>> +notmuch_message_file_get_mime_message (notmuch_message_file_t *message) +>> { +>> - int contains; +>> - char *header, *decoded_value, *header_sofar, *combined_header; +>> - const char *s, *colon; +>> - int match, newhdr, hdrsofar, is_received; +>> - static int initialized = 0; +>> + if (notmuch_message_file_parse (message)) +>> + return NULL; +>> +>> - is_received = (strcmp(header_desired,"received") == 0); +>> + return message->message; +>> +} +>> +>> - if (! initialized) { +>> - g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS); +>> - initialized = 1; +>> - } +>> +/* +>> + * Get all instances of a header decoded and concatenated. +>> + * +>> + * The result must be freed using g_free(). +>> + * +>> + * Return NULL on errors, empty string for non-existing headers. +>> + */ +>> +static char * +>> +_notmuch_message_file_get_combined_header (notmuch_message_file_t *message, +>> + const char *header) +>> +{ +>> + GMimeHeaderList *headers; +>> + GMimeHeaderIter *iter; +>> + char *combined = NULL; +>> +>> - message->parsing_started = 1; +>> - +>> - if (header_desired == NULL) +>> - contains = 0; +>> - else +>> - contains = g_hash_table_lookup_extended (message->headers, +>> - header_desired, NULL, +>> - (gpointer *) &decoded_value); +>> - +>> - if (contains && decoded_value) +>> - return decoded_value; +>> - +>> - if (message->parsing_finished) +>> - return ""; +>> - +>> -#define NEXT_HEADER_LINE(closure) \ +>> - while (1) { \ +>> - ssize_t bytes_read = getline (&message->line, \ +>> - &message->line_size, \ +>> - message->file); \ +>> - if (bytes_read == -1) { \ +>> - message->parsing_finished = 1; \ +>> - break; \ +>> - } \ +>> - if (*message->line == '\n') { \ +>> - message->parsing_finished = 1; \ +>> - break; \ +>> - } \ +>> - if (closure && \ +>> - (*message->line == ' ' || *message->line == '\t')) \ +>> - { \ +>> - copy_header_unfolding ((closure), message->line); \ +>> - } \ +>> - if (*message->line == ' ' || *message->line == '\t') \ +>> - message->header_size += strlen (message->line); \ +>> - else \ +>> - break; \ +>> - } +>> + headers = g_mime_object_get_header_list (GMIME_OBJECT (message->message)); +>> + if (! headers) +>> + return NULL; +>> +>> - if (message->line == NULL) +>> - NEXT_HEADER_LINE (NULL); +>> + iter = g_mime_header_iter_new (); +>> + if (! iter) +>> + return NULL; +>> +>> - while (1) { +>> + if (! g_mime_header_list_get_iter (headers, iter)) +>> + goto DONE; +>> +>> - if (message->parsing_finished) +>> - break; +>> + do { +>> + const char *value; +> +> Incorrect indentation. + +I wonder what happened there! + +> +>> + char *decoded; +>> +>> - colon = strchr (message->line, ':'); +>> + if (strcasecmp (g_mime_header_iter_get_name (iter), header) != 0) +>> + continue; +>> +>> - if (colon == NULL) { +>> - message->broken_headers++; +>> - /* A simple heuristic for giving up on things that just +>> - * don't look like mail messages. */ +>> - if (message->broken_headers >= 10 && +>> - message->good_headers < 5) +>> - { +>> - message->parsing_finished = 1; +>> - break; +>> + value = g_mime_header_iter_get_value (iter); +>> + decoded = g_mime_utils_header_decode_text (value); +> +> Maybe a comment saying that GMime retains ownership of value, but +> decoded must be freed with g_free? GMime is confusing about +> ownership, and the fact that value *doesn't* have to be freed (which +> is not documented by GMime!) just adds to the confusion. I could see +> this being easy to screw up during later code changes. + +Agreed. + +> +>> + if (! decoded) { +>> + if (combined) { +>> + g_free (combined); +>> + combined = NULL; +>> } +>> - NEXT_HEADER_LINE (NULL); +>> - continue; +>> + goto DONE; +>> } +>> +>> - message->header_size += strlen (message->line); +>> + if (combined) { +>> + char *tmp = g_strdup_printf ("%s %s", combined, decoded); +>> + g_free (decoded); +>> + g_free (combined); +>> + if (! tmp) { +>> + combined = NULL; +>> + goto DONE; +>> + } +>> +>> - message->good_headers++; +>> + combined = tmp; +>> + } else { +>> + combined = decoded; +>> + } +>> + } while (g_mime_header_iter_next (iter)); +> +> Yet another funny GMime API... Did you consider +> +> for (; g_mime_header_iter_is_valid (iter); g_mime_header_iter_next (iter)) +> +> ? I realize that the g_mime_header_iter_is_valid isn't actually +> necessary because g_mime_header_list_get_iter returns NULL if the +> iterator would be empty, but using a for-loop would make it more +> obvious that this code is correct. + +As I mentioned on IRC, that would lead to an infinite loop. Ugh. + +> +>> +>> - header = xstrndup (message->line, colon - message->line); +>> + /* Return empty string for non-existing headers. */ +>> + if (! combined) +>> + combined = g_strdup (""); +>> +>> - if (message->restrict_headers && +>> - ! g_hash_table_lookup_extended (message->headers, +>> - header, NULL, NULL)) +>> - { +>> - free (header); +>> - NEXT_HEADER_LINE (NULL); +>> - continue; +>> - } +>> + DONE: +>> + g_mime_header_iter_free (iter); +>> +>> - s = colon + 1; +>> - while (*s == ' ' || *s == '\t') +>> - s++; +>> + return combined; +>> +} +> +> Oof. That received header had better be worth it! + +I'm not sure it is, but we've got users for it, so... + +> +>> +>> - message->value.len = 0; +>> - copy_header_unfolding (&message->value, s); +>> +/* Return NULL on errors, empty string for non-existing headers. */ +> +> This may sound strange, but I'd be inclined to omit this comment, +> since it made me not realize I should go looking for the much more +> thorough comment in notmuch-private.h. + +Agreed. + +> +>> +const char * +>> +notmuch_message_file_get_header (notmuch_message_file_t *message, +>> + const char *header) +>> +{ +>> + const char *value = NULL; +>> + char *decoded; +>> + notmuch_bool_t found; +>> +>> - NEXT_HEADER_LINE (&message->value); +>> + if (notmuch_message_file_parse (message)) +>> + return NULL; +>> +>> - if (header_desired == NULL) +>> - match = 0; +>> + /* If we have a cached decoded value, use it. */ +>> + found = g_hash_table_lookup_extended (message->headers, header, +>> + NULL, (gpointer *) &value); +>> + if (found) +>> + return value ? value : ""; +> +> When would header be found, but value be NULL? (If this can't in fact +> happen, how about using g_hash_table_lookup?) + +Right. This was a remnant of header restricting which has now been +removed. Switching to g_hash_table_lookup. + +> +>> + +>> + if (strcasecmp (header, "received") == 0) { +>> + /* +>> + * The Received: header is special. We concatenate all +>> + * instances of the header as we use this when analyzing the +>> + * path the mail has taken from sender to recipient. +>> + */ +>> + decoded = _notmuch_message_file_get_combined_header (message, header); +>> + } else { +>> + value = g_mime_object_get_header (GMIME_OBJECT (message->message), +>> + header); +>> + if (value) +>> + decoded = g_mime_utils_header_decode_text (value); +>> else +>> - match = (strcasecmp (header, header_desired) == 0); +>> - +>> - decoded_value = g_mime_utils_header_decode_text (message->value.str); +>> - header_sofar = (char *)g_hash_table_lookup (message->headers, header); +>> - /* we treat the Received: header special - we want to concat ALL of +>> - * the Received: headers we encounter. +>> - * for everything else we return the first instance of a header */ +>> - if (strcasecmp(header, "received") == 0) { +>> - if (header_sofar == NULL) { +>> - /* first Received: header we encountered; just add it */ +>> - g_hash_table_insert (message->headers, header, decoded_value); +>> - } else { +>> - /* we need to add the header to those we already collected */ +>> - newhdr = strlen(decoded_value); +>> - hdrsofar = strlen(header_sofar); +>> - combined_header = g_malloc(hdrsofar + newhdr + 2); +>> - strncpy(combined_header,header_sofar,hdrsofar); +>> - *(combined_header+hdrsofar) = ' '; +>> - strncpy(combined_header+hdrsofar+1,decoded_value,newhdr+1); +>> - g_free (decoded_value); +>> - g_hash_table_insert (message->headers, header, combined_header); +>> - } +>> - } else { +>> - if (header_sofar == NULL) { +>> - /* Only insert if we don't have a value for this header, yet. */ +>> - g_hash_table_insert (message->headers, header, decoded_value); +>> - } else { +>> - free (header); +>> - g_free (decoded_value); +>> - decoded_value = header_sofar; +>> - } +>> - } +>> - /* if we found a match we can bail - unless of course we are +>> - * collecting all the Received: headers */ +>> - if (match && !is_received) +>> - return decoded_value; +>> - } +>> - +>> - if (message->parsing_finished) { +>> - fclose (message->file); +>> - message->file = NULL; +>> + decoded = g_strdup (""); +>> } +>> +>> - if (message->line) +>> - free (message->line); +>> - message->line = NULL; +>> - +>> - if (message->value.size) { +>> - free (message->value.str); +>> - message->value.str = NULL; +>> - message->value.size = 0; +>> - message->value.len = 0; +>> - } +>> + if (! decoded) +>> + return NULL; +>> +>> - /* For the Received: header we actually might end up here even +>> - * though we found the header (as we force continued parsing +>> - * in that case). So let's check if that's the header we were +>> - * looking for and return the value that we found (if any) +>> - */ +>> - if (is_received) +>> - return (char *)g_hash_table_lookup (message->headers, "received"); +>> - +>> - /* We've parsed all headers and never found the one we're looking +>> - * for. It's probably just not there, but let's check that we +>> - * didn't make a mistake preventing us from seeing it. */ +>> - if (message->restrict_headers && header_desired && +>> - ! g_hash_table_lookup_extended (message->headers, +>> - header_desired, NULL, NULL)) +>> - { +>> - INTERNAL_ERROR ("Attempt to get header \"%s\" which was not\n" +>> - "included in call to notmuch_message_file_restrict_headers\n", +>> - header_desired); +>> - } +>> + /* Cache the decoded value. We also own the strings. */ +>> + g_hash_table_insert (message->headers, xstrdup (header), decoded); +>> +>> - return ""; +>> + return decoded; +>> } +>> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h +>> index 59eb2bc285a5..734a4e338554 100644 +>> --- a/lib/notmuch-private.h +>> +++ b/lib/notmuch-private.h +>> @@ -46,6 +46,8 @@ NOTMUCH_BEGIN_DECLS +>> +>> #include +>> +>> +#include +>> + +>> #include "xutil.h" +>> #include "error_util.h" +>> +>> @@ -320,13 +322,6 @@ notmuch_message_set_author (notmuch_message_t *message, const char *author); +>> const char * +>> notmuch_message_get_author (notmuch_message_t *message); +>> +>> - +>> -/* index.cc */ +>> - +>> -notmuch_status_t +>> -_notmuch_message_index_file (notmuch_message_t *message, +>> - const char *filename); +>> - +>> /* message-file.c */ +>> +>> /* XXX: I haven't decided yet whether these will actually get exported +>> @@ -352,32 +347,31 @@ _notmuch_message_file_open_ctx (void *ctx, const char *filename); +>> void +>> notmuch_message_file_close (notmuch_message_file_t *message); +>> +>> -/* Restrict 'message' to only save the named headers. +>> +/* Parse the message. +>> * +>> - * When the caller is only interested in a short list of headers, +>> - * known in advance, calling this function can avoid wasted time and +>> - * memory parsing/saving header values that will never be needed. +>> + * This will be done automatically as necessary on other calls +>> + * depending on it, but an explicit call allows for better error +>> + * status reporting. +>> + */ +>> +notmuch_status_t +>> +notmuch_message_file_parse (notmuch_message_file_t *message); +> +> Most internal functions have a leading _. (We've diverged from that +> in a few places, especially in notmuch_message_file_*, but it would be +> nice to not diverge further from that convention.) + +Fixed. + +> +>> + +>> +/* Get the gmime message of a message file. +>> * +>> - * The variable arguments should be a list of const char * with a +>> - * final '(const char *) NULL' to terminate the list. +>> + * The message file is parsed as necessary. +>> * +>> - * If this function is called, it must be called before any calls to +>> - * notmuch_message_get_header for this message. +>> + * Returns GMimeMessage* on success (which the caller must not unref), +>> + * NULL if the message file parsing fails. +>> * +>> - * After calling this function, if notmuch_message_get_header is +>> - * called with a header name not in this list, then NULL will be +>> - * returned even if that header exists in the actual message. +>> + * XXX: Would be nice to not have to expose GMimeMessage here. +>> */ +>> -void +>> -notmuch_message_file_restrict_headers (notmuch_message_file_t *message, ...); +>> - +>> -/* Identical to notmuch_message_restrict_headers but accepting a va_list. */ +>> -void +>> -notmuch_message_file_restrict_headersv (notmuch_message_file_t *message, +>> - va_list va_headers); +>> +GMimeMessage * +>> +notmuch_message_file_get_mime_message (notmuch_message_file_t *message); +> +> Same comment about _. + +Ditto. + +> +>> +>> /* Get the value of the specified header from the message as a UTF-8 string. +>> * +>> + * The message file is parsed as necessary. +>> + * +>> * The header name is case insensitive. +>> * +>> * The Received: header is special - for it all Received: headers in +>> @@ -387,13 +381,19 @@ notmuch_message_file_restrict_headersv (notmuch_message_file_t *message, +>> * only until the message is closed. The caller should copy it if +>> * needing to modify the value or to hold onto it for longer. +>> * +>> - * Returns NULL if the message does not contain a header line matching +>> - * 'header'. +>> + * Returns NULL on errors, empty string if the message does not +>> + * contain a header line matching 'header'. +>> */ +>> const char * +>> notmuch_message_file_get_header (notmuch_message_file_t *message, +>> const char *header); +>> +>> +/* index.cc */ +>> + +>> +notmuch_status_t +>> +_notmuch_message_index_file (notmuch_message_t *message, +>> + notmuch_message_file_t *message_file); +>> + +> +> Any particular reason this floated down? (Obviously it's not an +> actual problem.) + +It needs the typedef for notmuch_message_file_t. Previous versions of +the patch moved the typedef, but this keeps the typedef next to other +stuff from message-file.c. + + +Thanks for the review. + +BR, +Jani. + + + +> +>> /* messages.c */ +>> +>> typedef struct _notmuch_message_node {