Re: [PATCH v3 6/6] lib: parse messages only once
authorAustin Clements <amdragon@MIT.EDU>
Mon, 3 Feb 2014 21:40:03 +0000 (16:40 +1900)
committerW. Trevor King <wking@tremily.us>
Fri, 7 Nov 2014 17:59:41 +0000 (09:59 -0800)
1b/a8755bcb92fbba185c5888815b91b7956bdec8 [new file with mode: 0644]

diff --git a/1b/a8755bcb92fbba185c5888815b91b7956bdec8 b/1b/a8755bcb92fbba185c5888815b91b7956bdec8
new file mode 100644 (file)
index 0000000..5173408
--- /dev/null
@@ -0,0 +1,275 @@
+Return-Path: <amdragon@mit.edu>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+       by olra.theworths.org (Postfix) with ESMTP id 1B257431FBC\r
+       for <notmuch@notmuchmail.org>; Mon,  3 Feb 2014 13:40:16 -0800 (PST)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.7\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5\r
+       tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+       by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+       with ESMTP id KlM0d8QK2WRE for <notmuch@notmuchmail.org>;\r
+       Mon,  3 Feb 2014 13:40:08 -0800 (PST)\r
+Received: from dmz-mailsec-scanner-6.mit.edu (dmz-mailsec-scanner-6.mit.edu\r
+       [18.7.68.35])\r
+       (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))\r
+       (No client certificate requested)\r
+       by olra.theworths.org (Postfix) with ESMTPS id 30F17431FAF\r
+       for <notmuch@notmuchmail.org>; Mon,  3 Feb 2014 13:40:08 -0800 (PST)\r
+X-AuditID: 12074423-f79726d000000cc9-1c-52f00cb7a523\r
+Received: from mailhub-auth-3.mit.edu ( [18.9.21.43])\r
+       (using TLS with cipher AES256-SHA (256/256 bits))\r
+       (Client did not present a certificate)\r
+       by dmz-mailsec-scanner-6.mit.edu (Symantec Messaging Gateway) with SMTP\r
+       id 62.FD.03273.7BC00F25; Mon,  3 Feb 2014 16:40:07 -0500 (EST)\r
+Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11])\r
+       by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id s13Le62l031338; \r
+       Mon, 3 Feb 2014 16:40:07 -0500\r
+Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91])\r
+       (authenticated bits=0)\r
+       (User authenticated as amdragon@ATHENA.MIT.EDU)\r
+       by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id s13Le4qh015564\r
+       (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT);\r
+       Mon, 3 Feb 2014 16:40:06 -0500\r
+Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80)\r
+       (envelope-from <amdragon@mit.edu>)\r
+       id 1WARFM-0002hk-66; Mon, 03 Feb 2014 16:40:04 -0500\r
+Date: Mon, 3 Feb 2014 16:40:03 -0500\r
+From: Austin Clements <amdragon@MIT.EDU>\r
+To: Jani Nikula <jani@nikula.org>\r
+Subject: Re: [PATCH v3 6/6] lib: parse messages only once\r
+Message-ID: <20140203214003.GN4375@mit.edu>\r
+References: <cover.1391456555.git.jani@nikula.org>\r
+       <31d785c4a3e4b90862a0fdc545d4e900a4c898e2.1391456555.git.jani@nikula.org>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain; charset=us-ascii\r
+Content-Disposition: inline\r
+In-Reply-To:\r
+ <31d785c4a3e4b90862a0fdc545d4e900a4c898e2.1391456555.git.jani@nikula.org>\r
+User-Agent: Mutt/1.5.21 (2010-09-15)\r
+X-Brightmail-Tracker:\r
+ H4sIAAAAAAAAA+NgFmpkleLIzCtJLcpLzFFi42IR4hTV1t3O8yHIYOZCcYum6c4W12/OZHZg\r
+       8rh1/zW7x7NVt5gDmKK4bFJSczLLUov07RK4MnY93MlccMOoonnTa7YGxifqXYycHBICJhJ7\r
+       FnxhhrDFJC7cW8/WxcjFISQwm0li85rlYAkhgQ2MEn8PZkEkTjFJHGq+xA7hLGGUWDTxFytI\r
+       FYuAisSW3xvAOtgENCS27V/OCGKLCChKbD65H8xmFpCW+Pa7mamLkYNDWMBS4kCTPUiYV0Bb\r
+       4sqmJiaIZXUSR+bMYoOIC0qcnPmEBaJVS+LGv5dgrSBjlv/jAAlzCoRJzJwOsVUU6IIpJ7ex\r
+       TWAUmoWkexaS7lkI3QsYmVcxyqbkVunmJmbmFKcm6xYnJ+blpRbpmunlZpbopaaUbmIEhTS7\r
+       i/IOxj8HlQ4xCnAwKvHwdux9FyTEmlhWXJl7iFGSg0lJlFef4UOQEF9SfkplRmJxRnxRaU5q\r
+       8SFGCQ5mJRFev0/vg4R4UxIrq1KL8mFS0hwsSuK8iTPeBAkJpCeWpGanphakFsFkZTg4lCR4\r
+       OYGxKyRYlJqeWpGWmVOCkGbi4AQZzgM0XAqkhre4IDG3ODMdIn+KUVFKnPcnN1BCACSRUZoH\r
+       1wtLOa8YxYFeEeZlBWnnAaYruO5XQIOZgAavcwW5urgkESEl1cC4x/DzAf9ji2fHJX9XTu2R\r
+       DX5+/+Ehu5VajAz+OkVipxv1Fr9KMLQ26VTKtokrmVl8NqzYZI/swXtrvhfGrp/iY/ZEU0Pm\r
+       4PrH8+KfXck1kLodO093BodEXmTJhA6HmMWV294Xr5ZepPBx5vWFabd7il0/6BVNvR0r1hrM\r
+       tVrlcnifYPP08wJKLMUZiYZazEXFiQA8LkdPFAMAAA==\r
+Cc: notmuch@notmuchmail.org\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+       <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+       <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+       <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Mon, 03 Feb 2014 21:40:16 -0000\r
+\r
+Quoth Jani Nikula on Feb 03 at  9:51 pm:\r
+> Use the previously parsed gmime message for indexing instead of\r
+> running an extra parsing pass.\r
+> \r
+> After this change, we'll only do unnecessary parsing of the message\r
+> body for duplicates and non-messages. For regular non-duplicate\r
+> messages, we have now shaved off an extra header parsing round during\r
+> indexing.\r
+> ---\r
+>  lib/database.cc       |  2 +-\r
+>  lib/index.cc          | 59 ++++++---------------------------------------------\r
+>  lib/message-file.c    |  9 ++++++++\r
+>  lib/notmuch-private.h | 16 ++++++++++++--\r
+>  4 files changed, 30 insertions(+), 56 deletions(-)\r
+> \r
+> diff --git a/lib/database.cc b/lib/database.cc\r
+> index d1bea88..3a29fe7 100644\r
+> --- a/lib/database.cc\r
+> +++ b/lib/database.cc\r
+> @@ -2029,7 +2029,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch,\r
+>          date = notmuch_message_file_get_header (message_file, "date");\r
+>          _notmuch_message_set_header_values (message, date, from, subject);\r
+>  \r
+> -        ret = _notmuch_message_index_file (message, filename);\r
+> +        ret = _notmuch_message_index_file (message, message_file);\r
+>          if (ret)\r
+>              goto DONE;\r
+>      } else {\r
+> diff --git a/lib/index.cc b/lib/index.cc\r
+> index 976e49f..71397da 100644\r
+> --- a/lib/index.cc\r
+> +++ b/lib/index.cc\r
+> @@ -425,52 +425,15 @@ _index_mime_part (notmuch_message_t *message,\r
+>  \r
+>  notmuch_status_t\r
+>  _notmuch_message_index_file (notmuch_message_t *message,\r
+> -                         const char *filename)\r
+> +                         notmuch_message_file_t *message_file)\r
+>  {\r
+> -    GMimeStream *stream = NULL;\r
+> -    GMimeParser *parser = NULL;\r
+> -    GMimeMessage *mime_message = NULL;\r
+> +    GMimeMessage *mime_message;\r
+>      InternetAddressList *addresses;\r
+> -    FILE *file = NULL;\r
+>      const char *from, *subject;\r
+> -    notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;\r
+> -    static int initialized = 0;\r
+> -    char from_buf[5];\r
+> -    bool is_mbox = false;\r
+> -\r
+> -    if (! initialized) {\r
+> -    g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS);\r
+> -    initialized = 1;\r
+> -    }\r
+> -\r
+> -    file = fopen (filename, "r");\r
+> -    if (! file) {\r
+> -    fprintf (stderr, "Error opening %s: %s\n", filename, strerror (errno));\r
+> -    ret = NOTMUCH_STATUS_FILE_ERROR;\r
+> -    goto DONE;\r
+> -    }\r
+> -\r
+> -    /* Is this mbox? */\r
+> -    if (fread (from_buf, sizeof (from_buf), 1, file) == 1 &&\r
+> -    strncmp (from_buf, "From ", 5) == 0)\r
+> -    is_mbox = true;\r
+> -    rewind (file);\r
+> -\r
+> -    /* Evil GMime steals my FILE* here so I won't fclose it. */\r
+> -    stream = g_mime_stream_file_new (file);\r
+> -\r
+> -    parser = g_mime_parser_new_with_stream (stream);\r
+> -    g_mime_parser_set_scan_from (parser, is_mbox);\r
+>  \r
+> -    mime_message = g_mime_parser_construct_message (parser);\r
+> -\r
+> -    if (is_mbox) {\r
+> -    if (!g_mime_parser_eos (parser)) {\r
+> -        /* This is a multi-message mbox. */\r
+> -        ret = NOTMUCH_STATUS_FILE_NOT_EMAIL;\r
+> -        goto DONE;\r
+> -    }\r
+> -    }\r
+> +    mime_message = notmuch_message_file_get_mime_message (message_file);\r
+> +    if (! mime_message)\r
+> +    return NOTMUCH_STATUS_FILE_NOT_EMAIL; /* more like internal error */\r
+\r
+Are there situations other than forgetting to call\r
+notmuch_message_file_parse that could cause this?  (Speaking of which,\r
+where is notmuch_message_file_parse called?)\r
+\r
+>  \r
+>      from = g_mime_message_get_sender (mime_message);\r
+>  \r
+> @@ -491,15 +454,5 @@ _notmuch_message_index_file (notmuch_message_t *message,\r
+>  \r
+>      _index_mime_part (message, g_mime_message_get_mime_part (mime_message));\r
+>  \r
+> -  DONE:\r
+> -    if (mime_message)\r
+> -    g_object_unref (mime_message);\r
+> -\r
+> -    if (parser)\r
+> -    g_object_unref (parser);\r
+> -\r
+> -    if (stream)\r
+> -    g_object_unref (stream);\r
+> -\r
+> -    return ret;\r
+> +    return NOTMUCH_STATUS_SUCCESS;\r
+>  }\r
+> diff --git a/lib/message-file.c b/lib/message-file.c\r
+> index 33f6468..99e1dc8 100644\r
+> --- a/lib/message-file.c\r
+> +++ b/lib/message-file.c\r
+> @@ -250,6 +250,15 @@ mboxes is deprecated and may be removed in the future.\n", message->filename);\r
+>      return NOTMUCH_STATUS_SUCCESS;\r
+>  }\r
+>  \r
+> +GMimeMessage *\r
+> +notmuch_message_file_get_mime_message (notmuch_message_file_t *message)\r
+> +{\r
+> +    if (! message->parsed)\r
+> +    return NULL;\r
+\r
+This seems like another good opportunity to call the parser lazily and\r
+hide notmuch_message_file_parse from the caller, rather than requiring\r
+the caller to implement a particular call sequence (which I wasn't\r
+even able to find above).  This might also clean up the error handling\r
+in the call to notmuch_message_file_get_mime_message above.\r
+\r
+> +\r
+> +    return message->message;\r
+> +}\r
+> +\r
+>  /* return NULL on errors, empty string for non-existing headers */\r
+>  const char *\r
+>  notmuch_message_file_get_header (notmuch_message_file_t *message,\r
+> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h\r
+> index 7277df1..7559521 100644\r
+> --- a/lib/notmuch-private.h\r
+> +++ b/lib/notmuch-private.h\r
+> @@ -46,6 +46,8 @@ NOTMUCH_BEGIN_DECLS\r
+>  \r
+>  #include <talloc.h>\r
+>  \r
+> +#include <gmime/gmime.h>\r
+> +\r
+>  #include "xutil.h"\r
+>  #include "error_util.h"\r
+>  \r
+> @@ -320,9 +322,11 @@ notmuch_message_get_author (notmuch_message_t *message);\r
+>  \r
+>  /* index.cc */\r
+>  \r
+> +typedef struct _notmuch_message_file notmuch_message_file_t;\r
+> +\r
+>  notmuch_status_t\r
+>  _notmuch_message_index_file (notmuch_message_t *message,\r
+> -                         const char *filename);\r
+> +                         notmuch_message_file_t *message_file);\r
+>  \r
+>  /* message-file.c */\r
+>  \r
+> @@ -330,7 +334,6 @@ _notmuch_message_index_file (notmuch_message_t *message,\r
+>   * into the public interface in notmuch.h\r
+>   */\r
+>  \r
+> -typedef struct _notmuch_message_file notmuch_message_file_t;\r
+>  \r
+>  /* Open a file containing a single email message.\r
+>   *\r
+> @@ -377,6 +380,15 @@ void\r
+>  notmuch_message_file_restrict_headersv (notmuch_message_file_t *message,\r
+>                                      va_list va_headers);\r
+>  \r
+> +/* Get the gmime message of a parsed message file.\r
+> + *\r
+> + * Returns NULL if the message file has not been parsed.\r
+> + *\r
+> + * XXX: Would be nice to not have to expose GMimeMessage here.\r
+\r
+Maybe just forward-declare struct GMimeMessage?  Then you also\r
+wouldn't need to add the gmime #include.\r
+\r
+> + */\r
+> +GMimeMessage *\r
+> +notmuch_message_file_get_mime_message (notmuch_message_file_t *message);\r
+> +\r
+>  /* Get the value of the specified header from the message as a UTF-8 string.\r
+>   *\r
+>   * The header name is case insensitive.\r