Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 4316B431FBC for ; Sun, 30 Mar 2014 14:10:38 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id X6JfBu+sVYdc for ; Sun, 30 Mar 2014 14:10:30 -0700 (PDT) Received: from mail-we0-f175.google.com (mail-we0-f175.google.com [74.125.82.175]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 35A3B431FB6 for ; Sun, 30 Mar 2014 14:10:30 -0700 (PDT) Received: by mail-we0-f175.google.com with SMTP id q58so3924177wes.20 for ; Sun, 30 Mar 2014 14:10:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references :user-agent:date:message-id:mime-version:content-type; bh=Kllr2l+SxkOCXuc9h87xVTzIjR//GKogXIFNhgA110I=; b=HeJn0dTK7twgwaH3SaNRHZEauMYGUM50wNWuVa2uQu6l87mc55mFheVL6rrsfkIHVo uIHJBHeNJPvZJFgV/iDIDS1cORFBMUQM/SDhYL/od4Pj05iF5231bYVmFiscqX3gWb79 tmDQV+sYBJqePYjwWX3OrcByi04Wm+KF9zcFmTggM/8oY6M8B+B6ONrRp7NhRGcteX98 wvnD58gzyf5x94qEUIjXm/uuumqX4CXVxcr3J6tsJtMMaC8udAYALpiDfSZ+bBaSzf4f ccabJ0AgOaJimWdAxcjqPONp2bvjB4cp79pCJI3OctzD/8HgxmFP212zcmfT1xG7lVEf Nd/A== X-Gm-Message-State: ALoCoQlV/gE6PtA+uBQdqTuVFLM3c2j2vbKJbv+0QHLFjiv0yCoOYlHdXC+pOdYC11LfhXW/YPZU X-Received: by 10.194.60.37 with SMTP id e5mr9233199wjr.32.1396213825059; Sun, 30 Mar 2014 14:10:25 -0700 (PDT) Received: from localhost (dsl-hkibrasgw2-58c36f-91.dhcp.inet.fi. [88.195.111.91]) by mx.google.com with ESMTPSA id 44sm28262047eek.30.2014.03.30.14.10.22 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sun, 30 Mar 2014 14:10:23 -0700 (PDT) From: Jani Nikula To: Austin Clements Subject: Re: [PATCH v4] lib: replace the header parser with gmime In-Reply-To: <20140329221127.GH31187@mit.edu> References: <1395604866-19188-1-git-send-email-jani@nikula.org> <20140329221127.GH31187@mit.edu> User-Agent: Notmuch/0.17+164~gcd5fd5a4837e (http://notmuchmail.org) Emacs/24.3.1 (x86_64-pc-linux-gnu) Date: Mon, 31 Mar 2014 00:10:20 +0300 Message-ID: <87y4zrl7er.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Mar 2014 21:10:38 -0000 On Sun, 30 Mar 2014, Austin Clements wrote: > Various little comments below. Overall this is looking really good. > > Quoth Jani Nikula on Mar 23 at 10:01 pm: >> The notmuch library includes a full blown message header parser. Yet >> the same message headers are parsed by gmime during indexing. Switch >> to gmime parsing completely. >> >> These are the main changes: >> >> * Gmime stops header parsing at the first invalid header, and presumes >> the message body starts from there. The current parser is quite >> liberal in accepting broken headers. The change means we will be >> much pickier about accepting invalid messages. >> >> * The current parser converts tabs used in header folding to >> spaces. Gmime preserve the tabs. Due to a broken python library used >> in mailman, there are plenty of mailing lists that produce headers >> with tabs in header folding, and we'll see plenty of tabs. (This >> change has been mitigated in preparatory patches.) >> >> * For pure header parsing, the current parser is likely faster than >> gmime, which parses the whole message rather than just the >> headers. Since we parse the message and its headers using gmime for >> indexing anyway, this avoids and extra header parsing round when >> adding new messages. In case of duplicate messages, we'll end up >> parsing the full message although just headers would be >> sufficient. All in all this should still speed up 'notmuch new'. >> >> * Calls to notmuch_message_get_header() may be slightly slower than >> previously for headers that are not indexed in the database, due to >> parsing of the whole message. Within the notmuch code base, notmuch >> reply is the only such user. >> >> --- >> >> This is v4 of patches [1] and [2], combining them into one. Patch [3] >> is required to make all the tests pass. >> >> The implementation has been changed considerably. Notably we only >> cache the decoded headers to a hash table if and when they are >> needed. The main reasons for doing caching at all is that gmime >> provides raw headers that need decoding, and to minimize changes >> outside of message-file.c we want to own and track the allocated >> strings instead of pushing the responsibility to callers. >> >> This addresses most comments from Austin. >> >> The diff in message-file.c is rather confusing due to a mix of >> added/removed lines. Looking at the patched file is much more >> pleasant. >> >> BR, >> Jani. >> >> [1] id:bdef45dc21c777130a7ae1fa650f172d8c6eaaf4.1391456555.git.jani@nikula.org >> [2] id:31d785c4a3e4b90862a0fdc545d4e900a4c898e2.1391456555.git.jani@nikula.org >> [3] id:1395247490-19892-1-git-send-email-jani@nikula.org >> --- >> lib/database.cc | 15 +- >> lib/index.cc | 70 +-------- >> lib/message-file.c | 420 ++++++++++++++++++++------------------------------ >> lib/notmuch-private.h | 54 +++---- >> 4 files changed, 208 insertions(+), 351 deletions(-) >> >> diff --git a/lib/database.cc b/lib/database.cc >> index aef748f75b5d..4750f40cf0fb 100644 >> --- a/lib/database.cc >> +++ b/lib/database.cc >> @@ -1971,15 +1971,10 @@ notmuch_database_add_message (notmuch_database_t *notmuch, >> if (ret) >> goto DONE; >> >> - notmuch_message_file_restrict_headers (message_file, >> - "date", >> - "from", >> - "in-reply-to", >> - "message-id", >> - "references", >> - "subject", >> - "to", >> - (char *) NULL); >> + /* Parse message up front to get better error status. */ >> + ret = notmuch_message_file_parse (message_file); >> + if (ret) >> + goto DONE; >> >> try { >> /* Before we do any real work, (especially before doing a >> @@ -2066,7 +2061,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch, >> date = notmuch_message_file_get_header (message_file, "date"); >> _notmuch_message_set_header_values (message, date, from, subject); >> >> - ret = _notmuch_message_index_file (message, filename); >> + ret = _notmuch_message_index_file (message, message_file); >> if (ret) >> goto DONE; >> } else { >> diff --git a/lib/index.cc b/lib/index.cc >> index 78c18cf36d10..46a019325454 100644 >> --- a/lib/index.cc >> +++ b/lib/index.cc >> @@ -425,63 +425,15 @@ _index_mime_part (notmuch_message_t *message, >> >> notmuch_status_t >> _notmuch_message_index_file (notmuch_message_t *message, >> - const char *filename) >> + notmuch_message_file_t *message_file) >> { >> - GMimeStream *stream = NULL; >> - GMimeParser *parser = NULL; >> - GMimeMessage *mime_message = NULL; >> + GMimeMessage *mime_message; >> InternetAddressList *addresses; >> - FILE *file = NULL; >> const char *from, *subject; >> - notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; >> - static int initialized = 0; >> - char from_buf[5]; >> - bool is_mbox = false; >> - static bool mbox_warning = false; >> - >> - if (! initialized) { >> - g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS); >> - initialized = 1; >> - } >> - >> - file = fopen (filename, "r"); >> - if (! file) { >> - fprintf (stderr, "Error opening %s: %s\n", filename, strerror (errno)); >> - ret = NOTMUCH_STATUS_FILE_ERROR; >> - goto DONE; >> - } >> - >> - /* Is this mbox? */ >> - if (fread (from_buf, sizeof (from_buf), 1, file) == 1 && >> - strncmp (from_buf, "From ", 5) == 0) >> - is_mbox = true; >> - rewind (file); >> >> - /* Evil GMime steals my FILE* here so I won't fclose it. */ >> - stream = g_mime_stream_file_new (file); >> - >> - parser = g_mime_parser_new_with_stream (stream); >> - g_mime_parser_set_scan_from (parser, is_mbox); >> - >> - mime_message = g_mime_parser_construct_message (parser); >> - >> - if (is_mbox) { >> - if (!g_mime_parser_eos (parser)) { >> - /* This is a multi-message mbox. */ >> - ret = NOTMUCH_STATUS_FILE_NOT_EMAIL; >> - goto DONE; >> - } >> - /* For historical reasons, we support single-message mboxes, >> - * but this behavior is likely to change in the future, so >> - * warn. */ >> - if (!mbox_warning) { >> - mbox_warning = true; >> - fprintf (stderr, "\ >> -Warning: %s is an mbox containing a single message,\n\ >> -likely caused by misconfigured mail delivery. Support for single-message\n\ >> -mboxes is deprecated and may be removed in the future.\n", filename); >> - } >> - } >> + mime_message = notmuch_message_file_get_mime_message (message_file); >> + if (! mime_message) >> + return NOTMUCH_STATUS_FILE_NOT_EMAIL; > > This could also be because of out-of-memory. > notmuch_message_file_get_mime_message (and notmuch_message_file_parse, > in turn) should probably take a notmuch_status_t *status_ret argument > like other internal functions. Agreed (though I opted for returning notmuch_status_t and passing GMimeMessage** as parameter). > >> >> from = g_mime_message_get_sender (mime_message); >> >> @@ -502,15 +454,5 @@ mboxes is deprecated and may be removed in the future.\n", filename); >> >> _index_mime_part (message, g_mime_message_get_mime_part (mime_message)); >> >> - DONE: >> - if (mime_message) >> - g_object_unref (mime_message); >> - >> - if (parser) >> - g_object_unref (parser); >> - >> - if (stream) >> - g_object_unref (stream); >> - >> - return ret; >> + return NOTMUCH_STATUS_SUCCESS; >> } >> diff --git a/lib/message-file.c b/lib/message-file.c >> index a2850c278b5a..88662608d319 100644 >> --- a/lib/message-file.c >> +++ b/lib/message-file.c >> @@ -26,30 +26,15 @@ >> >> #include /* GHashTable */ >> >> -typedef struct { >> - char *str; >> - size_t size; >> - size_t len; >> -} header_value_closure_t; >> - >> struct _notmuch_message_file { >> /* File object */ >> FILE *file; >> + char *filename; >> >> - /* Header storage */ >> - int restrict_headers; >> + /* Cache for decoded headers */ >> GHashTable *headers; >> - int broken_headers; >> - int good_headers; >> - size_t header_size; /* Length of full message header in bytes. */ >> - >> - /* Parsing state */ >> - char *line; >> - size_t line_size; >> - header_value_closure_t value; >> >> - int parsing_started; >> - int parsing_finished; >> + GMimeMessage *message; >> }; >> >> static int >> @@ -76,15 +61,12 @@ strcase_hash (const void *ptr) >> static int >> _notmuch_message_file_destructor (notmuch_message_file_t *message) >> { >> - if (message->line) >> - free (message->line); >> - >> - if (message->value.size) >> - free (message->value.str); >> - >> if (message->headers) >> g_hash_table_destroy (message->headers); >> >> + if (message->message) >> + g_object_unref (message->message); >> + >> if (message->file) >> fclose (message->file); >> >> @@ -102,20 +84,17 @@ _notmuch_message_file_open_ctx (void *ctx, const char *filename) >> if (unlikely (message == NULL)) >> return NULL; >> >> + /* Only needed for error messages during parsing. */ >> + message->filename = talloc_strdup (message, filename); >> + if (message->filename == NULL) >> + goto FAIL; >> + >> talloc_set_destructor (message, _notmuch_message_file_destructor); >> >> message->file = fopen (filename, "r"); >> if (message->file == NULL) >> goto FAIL; >> >> - message->headers = g_hash_table_new_full (strcase_hash, >> - strcase_equal, >> - free, >> - g_free); >> - >> - message->parsing_started = 0; >> - message->parsing_finished = 0; >> - >> return message; >> >> FAIL: >> @@ -137,264 +116,205 @@ notmuch_message_file_close (notmuch_message_file_t *message) >> talloc_free (message); >> } >> >> -void >> -notmuch_message_file_restrict_headersv (notmuch_message_file_t *message, >> - va_list va_headers) >> +notmuch_status_t >> +notmuch_message_file_parse (notmuch_message_file_t *message) >> { >> - char *header; >> + GMimeStream *stream; >> + GMimeParser *parser; >> + notmuch_status_t status = NOTMUCH_STATUS_SUCCESS; >> + static int initialized = 0; >> + char from_buf[5]; >> + notmuch_bool_t is_mbox = FALSE; >> + static notmuch_bool_t mbox_warning = FALSE; >> >> - if (message->parsing_started) >> - INTERNAL_ERROR ("notmuch_message_file_restrict_headers called after parsing has started"); >> + if (message->message) >> + return NOTMUCH_STATUS_SUCCESS; >> >> - while (1) { >> - header = va_arg (va_headers, char*); >> - if (header == NULL) >> - break; >> - g_hash_table_insert (message->headers, >> - xstrdup (header), NULL); >> + if (! initialized) { >> + g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS); >> + initialized = 1; >> } >> >> - message->restrict_headers = 1; >> -} >> - >> -void >> -notmuch_message_file_restrict_headers (notmuch_message_file_t *message, ...) >> -{ >> - va_list va_headers; >> - >> - va_start (va_headers, message); >> + message->headers = g_hash_table_new_full (strcase_hash, strcase_equal, >> + free, g_free); >> + if (! message->headers) >> + return NOTMUCH_STATUS_OUT_OF_MEMORY; >> >> - notmuch_message_file_restrict_headersv (message, va_headers); >> -} >> + /* Is this mbox? */ >> + if (fread (from_buf, sizeof (from_buf), 1, message->file) == 1 && >> + strncmp (from_buf, "From ", 5) == 0) >> + is_mbox = TRUE; >> + rewind (message->file); >> >> -static void >> -copy_header_unfolding (header_value_closure_t *value, >> - const char *chunk) >> -{ >> - char *last; >> + stream = g_mime_stream_file_new (message->file); >> >> - if (chunk == NULL) >> - return; >> + /* We'll own and fclose the FILE* ourselves. */ >> + g_mime_stream_file_set_owner (GMIME_STREAM_FILE (stream), FALSE); >> >> - while (*chunk == ' ' || *chunk == '\t') >> - chunk++; >> + parser = g_mime_parser_new_with_stream (stream); >> + g_mime_parser_set_scan_from (parser, is_mbox); >> >> - if (value->len + 1 + strlen (chunk) + 1 > value->size) { >> - unsigned int new_size = value->size; >> - if (value->size == 0) >> - new_size = strlen (chunk) + 1; >> - else >> - while (value->len + 1 + strlen (chunk) + 1 > new_size) >> - new_size *= 2; >> - value->str = xrealloc (value->str, new_size); >> - value->size = new_size; >> + message->message = g_mime_parser_construct_message (parser); >> + if (! message->message) { >> + status = NOTMUCH_STATUS_FILE_NOT_EMAIL; >> + goto DONE; >> } >> >> - last = value->str + value->len; >> - if (value->len) { >> - *last = ' '; >> - last++; >> - value->len++; >> + if (is_mbox) { >> + if (! g_mime_parser_eos (parser)) { >> + /* This is a multi-message mbox. */ >> + status = NOTMUCH_STATUS_FILE_NOT_EMAIL; >> + goto DONE; >> + } >> + /* >> + * For historical reasons, we support single-message mboxes, >> + * but this behavior is likely to change in the future, so >> + * warn. >> + */ >> + if (! mbox_warning) { >> + mbox_warning = TRUE; >> + fprintf (stderr, "\ >> +Warning: %s is an mbox containing a single message,\n\ >> +likely caused by misconfigured mail delivery. Support for single-message\n\ >> +mboxes is deprecated and may be removed in the future.\n", message->filename); > > I still wonder if this is the right place to issue this warning. For > example, doesn't this mean that, when constructing a reply to a single > message mbox, we'll issue this warning? > > Or maybe it's time to drop single message mbox support. We deprecated > it in 0.15.1, which was over a year ago. OTOH, wheezy still has > 0.13.2, so there could be people still using mboxes without the > warning. OTOOH, we only kept this around to appear David anyway. So > maybe the answer is to leave this in an not think too hard about it. I'm adding a prep patch to drop mbox support altogether. > >> + } >> } >> >> - strcpy (last, chunk); >> - value->len += strlen (chunk); >> + DONE: >> + g_object_unref (stream); >> + g_object_unref (parser); >> + >> + if (status) { >> + g_hash_table_destroy (message->headers); >> + message->headers = NULL; >> >> - last = value->str + value->len - 1; >> - if (*last == '\n') { >> - *last = '\0'; >> - value->len--; >> + if (message->message) { >> + g_object_unref (message->message); >> + message->message = NULL; >> + } >> + >> + rewind (message->file); >> } >> + >> + return status; >> } >> >> -/* As a special-case, a value of NULL for header_desired will force >> - * the entire header to be parsed if it is not parsed already. This is >> - * used by the _notmuch_message_file_get_headers_end function. >> - * Another special case is the Received: header. For this header we >> - * want to concatenate all instances of the header instead of just >> - * hashing the first instance as we use this when analyzing the path >> - * the mail has taken from sender to recipient. >> - */ >> -const char * >> -notmuch_message_file_get_header (notmuch_message_file_t *message, >> - const char *header_desired) >> +GMimeMessage * >> +notmuch_message_file_get_mime_message (notmuch_message_file_t *message) >> { >> - int contains; >> - char *header, *decoded_value, *header_sofar, *combined_header; >> - const char *s, *colon; >> - int match, newhdr, hdrsofar, is_received; >> - static int initialized = 0; >> + if (notmuch_message_file_parse (message)) >> + return NULL; >> >> - is_received = (strcmp(header_desired,"received") == 0); >> + return message->message; >> +} >> >> - if (! initialized) { >> - g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS); >> - initialized = 1; >> - } >> +/* >> + * Get all instances of a header decoded and concatenated. >> + * >> + * The result must be freed using g_free(). >> + * >> + * Return NULL on errors, empty string for non-existing headers. >> + */ >> +static char * >> +_notmuch_message_file_get_combined_header (notmuch_message_file_t *message, >> + const char *header) >> +{ >> + GMimeHeaderList *headers; >> + GMimeHeaderIter *iter; >> + char *combined = NULL; >> >> - message->parsing_started = 1; >> - >> - if (header_desired == NULL) >> - contains = 0; >> - else >> - contains = g_hash_table_lookup_extended (message->headers, >> - header_desired, NULL, >> - (gpointer *) &decoded_value); >> - >> - if (contains && decoded_value) >> - return decoded_value; >> - >> - if (message->parsing_finished) >> - return ""; >> - >> -#define NEXT_HEADER_LINE(closure) \ >> - while (1) { \ >> - ssize_t bytes_read = getline (&message->line, \ >> - &message->line_size, \ >> - message->file); \ >> - if (bytes_read == -1) { \ >> - message->parsing_finished = 1; \ >> - break; \ >> - } \ >> - if (*message->line == '\n') { \ >> - message->parsing_finished = 1; \ >> - break; \ >> - } \ >> - if (closure && \ >> - (*message->line == ' ' || *message->line == '\t')) \ >> - { \ >> - copy_header_unfolding ((closure), message->line); \ >> - } \ >> - if (*message->line == ' ' || *message->line == '\t') \ >> - message->header_size += strlen (message->line); \ >> - else \ >> - break; \ >> - } >> + headers = g_mime_object_get_header_list (GMIME_OBJECT (message->message)); >> + if (! headers) >> + return NULL; >> >> - if (message->line == NULL) >> - NEXT_HEADER_LINE (NULL); >> + iter = g_mime_header_iter_new (); >> + if (! iter) >> + return NULL; >> >> - while (1) { >> + if (! g_mime_header_list_get_iter (headers, iter)) >> + goto DONE; >> >> - if (message->parsing_finished) >> - break; >> + do { >> + const char *value; > > Incorrect indentation. I wonder what happened there! > >> + char *decoded; >> >> - colon = strchr (message->line, ':'); >> + if (strcasecmp (g_mime_header_iter_get_name (iter), header) != 0) >> + continue; >> >> - if (colon == NULL) { >> - message->broken_headers++; >> - /* A simple heuristic for giving up on things that just >> - * don't look like mail messages. */ >> - if (message->broken_headers >= 10 && >> - message->good_headers < 5) >> - { >> - message->parsing_finished = 1; >> - break; >> + value = g_mime_header_iter_get_value (iter); >> + decoded = g_mime_utils_header_decode_text (value); > > Maybe a comment saying that GMime retains ownership of value, but > decoded must be freed with g_free? GMime is confusing about > ownership, and the fact that value *doesn't* have to be freed (which > is not documented by GMime!) just adds to the confusion. I could see > this being easy to screw up during later code changes. Agreed. > >> + if (! decoded) { >> + if (combined) { >> + g_free (combined); >> + combined = NULL; >> } >> - NEXT_HEADER_LINE (NULL); >> - continue; >> + goto DONE; >> } >> >> - message->header_size += strlen (message->line); >> + if (combined) { >> + char *tmp = g_strdup_printf ("%s %s", combined, decoded); >> + g_free (decoded); >> + g_free (combined); >> + if (! tmp) { >> + combined = NULL; >> + goto DONE; >> + } >> >> - message->good_headers++; >> + combined = tmp; >> + } else { >> + combined = decoded; >> + } >> + } while (g_mime_header_iter_next (iter)); > > Yet another funny GMime API... Did you consider > > for (; g_mime_header_iter_is_valid (iter); g_mime_header_iter_next (iter)) > > ? I realize that the g_mime_header_iter_is_valid isn't actually > necessary because g_mime_header_list_get_iter returns NULL if the > iterator would be empty, but using a for-loop would make it more > obvious that this code is correct. As I mentioned on IRC, that would lead to an infinite loop. Ugh. > >> >> - header = xstrndup (message->line, colon - message->line); >> + /* Return empty string for non-existing headers. */ >> + if (! combined) >> + combined = g_strdup (""); >> >> - if (message->restrict_headers && >> - ! g_hash_table_lookup_extended (message->headers, >> - header, NULL, NULL)) >> - { >> - free (header); >> - NEXT_HEADER_LINE (NULL); >> - continue; >> - } >> + DONE: >> + g_mime_header_iter_free (iter); >> >> - s = colon + 1; >> - while (*s == ' ' || *s == '\t') >> - s++; >> + return combined; >> +} > > Oof. That received header had better be worth it! I'm not sure it is, but we've got users for it, so... > >> >> - message->value.len = 0; >> - copy_header_unfolding (&message->value, s); >> +/* Return NULL on errors, empty string for non-existing headers. */ > > This may sound strange, but I'd be inclined to omit this comment, > since it made me not realize I should go looking for the much more > thorough comment in notmuch-private.h. Agreed. > >> +const char * >> +notmuch_message_file_get_header (notmuch_message_file_t *message, >> + const char *header) >> +{ >> + const char *value = NULL; >> + char *decoded; >> + notmuch_bool_t found; >> >> - NEXT_HEADER_LINE (&message->value); >> + if (notmuch_message_file_parse (message)) >> + return NULL; >> >> - if (header_desired == NULL) >> - match = 0; >> + /* If we have a cached decoded value, use it. */ >> + found = g_hash_table_lookup_extended (message->headers, header, >> + NULL, (gpointer *) &value); >> + if (found) >> + return value ? value : ""; > > When would header be found, but value be NULL? (If this can't in fact > happen, how about using g_hash_table_lookup?) Right. This was a remnant of header restricting which has now been removed. Switching to g_hash_table_lookup. > >> + >> + if (strcasecmp (header, "received") == 0) { >> + /* >> + * The Received: header is special. We concatenate all >> + * instances of the header as we use this when analyzing the >> + * path the mail has taken from sender to recipient. >> + */ >> + decoded = _notmuch_message_file_get_combined_header (message, header); >> + } else { >> + value = g_mime_object_get_header (GMIME_OBJECT (message->message), >> + header); >> + if (value) >> + decoded = g_mime_utils_header_decode_text (value); >> else >> - match = (strcasecmp (header, header_desired) == 0); >> - >> - decoded_value = g_mime_utils_header_decode_text (message->value.str); >> - header_sofar = (char *)g_hash_table_lookup (message->headers, header); >> - /* we treat the Received: header special - we want to concat ALL of >> - * the Received: headers we encounter. >> - * for everything else we return the first instance of a header */ >> - if (strcasecmp(header, "received") == 0) { >> - if (header_sofar == NULL) { >> - /* first Received: header we encountered; just add it */ >> - g_hash_table_insert (message->headers, header, decoded_value); >> - } else { >> - /* we need to add the header to those we already collected */ >> - newhdr = strlen(decoded_value); >> - hdrsofar = strlen(header_sofar); >> - combined_header = g_malloc(hdrsofar + newhdr + 2); >> - strncpy(combined_header,header_sofar,hdrsofar); >> - *(combined_header+hdrsofar) = ' '; >> - strncpy(combined_header+hdrsofar+1,decoded_value,newhdr+1); >> - g_free (decoded_value); >> - g_hash_table_insert (message->headers, header, combined_header); >> - } >> - } else { >> - if (header_sofar == NULL) { >> - /* Only insert if we don't have a value for this header, yet. */ >> - g_hash_table_insert (message->headers, header, decoded_value); >> - } else { >> - free (header); >> - g_free (decoded_value); >> - decoded_value = header_sofar; >> - } >> - } >> - /* if we found a match we can bail - unless of course we are >> - * collecting all the Received: headers */ >> - if (match && !is_received) >> - return decoded_value; >> - } >> - >> - if (message->parsing_finished) { >> - fclose (message->file); >> - message->file = NULL; >> + decoded = g_strdup (""); >> } >> >> - if (message->line) >> - free (message->line); >> - message->line = NULL; >> - >> - if (message->value.size) { >> - free (message->value.str); >> - message->value.str = NULL; >> - message->value.size = 0; >> - message->value.len = 0; >> - } >> + if (! decoded) >> + return NULL; >> >> - /* For the Received: header we actually might end up here even >> - * though we found the header (as we force continued parsing >> - * in that case). So let's check if that's the header we were >> - * looking for and return the value that we found (if any) >> - */ >> - if (is_received) >> - return (char *)g_hash_table_lookup (message->headers, "received"); >> - >> - /* We've parsed all headers and never found the one we're looking >> - * for. It's probably just not there, but let's check that we >> - * didn't make a mistake preventing us from seeing it. */ >> - if (message->restrict_headers && header_desired && >> - ! g_hash_table_lookup_extended (message->headers, >> - header_desired, NULL, NULL)) >> - { >> - INTERNAL_ERROR ("Attempt to get header \"%s\" which was not\n" >> - "included in call to notmuch_message_file_restrict_headers\n", >> - header_desired); >> - } >> + /* Cache the decoded value. We also own the strings. */ >> + g_hash_table_insert (message->headers, xstrdup (header), decoded); >> >> - return ""; >> + return decoded; >> } >> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h >> index 59eb2bc285a5..734a4e338554 100644 >> --- a/lib/notmuch-private.h >> +++ b/lib/notmuch-private.h >> @@ -46,6 +46,8 @@ NOTMUCH_BEGIN_DECLS >> >> #include >> >> +#include >> + >> #include "xutil.h" >> #include "error_util.h" >> >> @@ -320,13 +322,6 @@ notmuch_message_set_author (notmuch_message_t *message, const char *author); >> const char * >> notmuch_message_get_author (notmuch_message_t *message); >> >> - >> -/* index.cc */ >> - >> -notmuch_status_t >> -_notmuch_message_index_file (notmuch_message_t *message, >> - const char *filename); >> - >> /* message-file.c */ >> >> /* XXX: I haven't decided yet whether these will actually get exported >> @@ -352,32 +347,31 @@ _notmuch_message_file_open_ctx (void *ctx, const char *filename); >> void >> notmuch_message_file_close (notmuch_message_file_t *message); >> >> -/* Restrict 'message' to only save the named headers. >> +/* Parse the message. >> * >> - * When the caller is only interested in a short list of headers, >> - * known in advance, calling this function can avoid wasted time and >> - * memory parsing/saving header values that will never be needed. >> + * This will be done automatically as necessary on other calls >> + * depending on it, but an explicit call allows for better error >> + * status reporting. >> + */ >> +notmuch_status_t >> +notmuch_message_file_parse (notmuch_message_file_t *message); > > Most internal functions have a leading _. (We've diverged from that > in a few places, especially in notmuch_message_file_*, but it would be > nice to not diverge further from that convention.) Fixed. > >> + >> +/* Get the gmime message of a message file. >> * >> - * The variable arguments should be a list of const char * with a >> - * final '(const char *) NULL' to terminate the list. >> + * The message file is parsed as necessary. >> * >> - * If this function is called, it must be called before any calls to >> - * notmuch_message_get_header for this message. >> + * Returns GMimeMessage* on success (which the caller must not unref), >> + * NULL if the message file parsing fails. >> * >> - * After calling this function, if notmuch_message_get_header is >> - * called with a header name not in this list, then NULL will be >> - * returned even if that header exists in the actual message. >> + * XXX: Would be nice to not have to expose GMimeMessage here. >> */ >> -void >> -notmuch_message_file_restrict_headers (notmuch_message_file_t *message, ...); >> - >> -/* Identical to notmuch_message_restrict_headers but accepting a va_list. */ >> -void >> -notmuch_message_file_restrict_headersv (notmuch_message_file_t *message, >> - va_list va_headers); >> +GMimeMessage * >> +notmuch_message_file_get_mime_message (notmuch_message_file_t *message); > > Same comment about _. Ditto. > >> >> /* Get the value of the specified header from the message as a UTF-8 string. >> * >> + * The message file is parsed as necessary. >> + * >> * The header name is case insensitive. >> * >> * The Received: header is special - for it all Received: headers in >> @@ -387,13 +381,19 @@ notmuch_message_file_restrict_headersv (notmuch_message_file_t *message, >> * only until the message is closed. The caller should copy it if >> * needing to modify the value or to hold onto it for longer. >> * >> - * Returns NULL if the message does not contain a header line matching >> - * 'header'. >> + * Returns NULL on errors, empty string if the message does not >> + * contain a header line matching 'header'. >> */ >> const char * >> notmuch_message_file_get_header (notmuch_message_file_t *message, >> const char *header); >> >> +/* index.cc */ >> + >> +notmuch_status_t >> +_notmuch_message_index_file (notmuch_message_t *message, >> + notmuch_message_file_t *message_file); >> + > > Any particular reason this floated down? (Obviously it's not an > actual problem.) It needs the typedef for notmuch_message_file_t. Previous versions of the patch moved the typedef, but this keeps the typedef next to other stuff from message-file.c. Thanks for the review. BR, Jani. > >> /* messages.c */ >> >> typedef struct _notmuch_message_node {