Re: [PATCH 1/2] Convert non-UTF-8 parts to UTF-8 before indexing them
authorAustin Clements <amdragon@MIT.EDU>
Sat, 25 Feb 2012 04:33:12 +0000 (23:33 +1900)
committerW. Trevor King <wking@tremily.us>
Fri, 7 Nov 2014 17:44:56 +0000 (09:44 -0800)
98/5c9bc43a5add49f6fce2c7a61477d444a41c18 [new file with mode: 0644]

diff --git a/98/5c9bc43a5add49f6fce2c7a61477d444a41c18 b/98/5c9bc43a5add49f6fce2c7a61477d444a41c18
new file mode 100644 (file)
index 0000000..35a1b90
--- /dev/null
@@ -0,0 +1,123 @@
+Return-Path: <amdragon@mit.edu>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+       by olra.theworths.org (Postfix) with ESMTP id E8602431FBD\r
+       for <notmuch@notmuchmail.org>; Sat, 25 Feb 2012 21:18:19 -0800 (PST)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.7\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5\r
+       tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+       by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+       with ESMTP id k9bubM7SlLN4 for <notmuch@notmuchmail.org>;\r
+       Sat, 25 Feb 2012 21:18:19 -0800 (PST)\r
+Received: from dmz-mailsec-scanner-3.mit.edu (DMZ-MAILSEC-SCANNER-3.MIT.EDU\r
+       [18.9.25.14])\r
+       by olra.theworths.org (Postfix) with ESMTP id EE42A431FAE\r
+       for <notmuch@notmuchmail.org>; Sat, 25 Feb 2012 21:18:18 -0800 (PST)\r
+X-AuditID: 1209190e-b7f7c6d0000008c3-87-4f48648c7ae1\r
+Received: from mailhub-auth-4.mit.edu ( [18.7.62.39])\r
+       by dmz-mailsec-scanner-3.mit.edu (Symantec Messaging Gateway) with SMTP\r
+       id F9.88.02243.C84684F4; Fri, 24 Feb 2012 23:33:16 -0500 (EST)\r
+Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103])\r
+       by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id q1P4XFM3026899; \r
+       Fri, 24 Feb 2012 23:33:15 -0500\r
+Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91])\r
+       (authenticated bits=0)\r
+       (User authenticated as amdragon@ATHENA.MIT.EDU)\r
+       by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id q1P4XCQo025930\r
+       (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT);\r
+       Fri, 24 Feb 2012 23:33:14 -0500 (EST)\r
+Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77)\r
+       (envelope-from <amdragon@mit.edu>)\r
+       id 1S19Jo-00013e-Hc; Fri, 24 Feb 2012 23:33:12 -0500\r
+Date: Fri, 24 Feb 2012 23:33:12 -0500\r
+From: Austin Clements <amdragon@MIT.EDU>\r
+To: Michal Sojka <sojkam1@fel.cvut.cz>\r
+Subject: Re: [PATCH 1/2] Convert non-UTF-8 parts to UTF-8 before indexing them\r
+Message-ID: <20120225043312.GL30513@mit.edu>\r
+References: <1330043595-22054-1-git-send-email-sojkam1@fel.cvut.cz>\r
+       <1330068983-4483-1-git-send-email-sojkam1@fel.cvut.cz>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain; charset=us-ascii\r
+Content-Disposition: inline\r
+In-Reply-To: <1330068983-4483-1-git-send-email-sojkam1@fel.cvut.cz>\r
+User-Agent: Mutt/1.5.21 (2010-09-15)\r
+X-Brightmail-Tracker:\r
+ H4sIAAAAAAAAA+NgFmpileLIzCtJLcpLzFFi42IRYrdT1+1J8fA32P9SzOL6zZnMFjevTmJz\r
+       YPL48ucDq8ezVbeYA5iiuGxSUnMyy1KL9O0SuDLeT53HXLCHr+LMpd2MDYzbuLsYOTkkBEwk\r
+       Dt3cwwZhi0lcuLceyObiEBLYxyjx8et2JpCEkMAGRonla4IgEieZJH4c2M0O4SxhlLh98Q6Q\r
+       w8HBIqAqceJWEkgDm4CGxLb9yxlBbBEBNYnuBSvANjALSEt8+90MNlRYwE9i5cN7YHFeAR2J\r
+       plV9zBDLaiQezpzJBBEXlDg58wkLRK+WxI1/L5lAVoHMWf6PAyTMKeAsse7fL7AxogIqElNO\r
+       bmObwCg0C0n3LCTdsxC6FzAyr2KUTcmt0s1NzMwpTk3WLU5OzMtLLdI11svNLNFLTSndxAgK\r
+       ak5Jvh2MXw8qHWIU4GBU4uFl3uLuL8SaWFZcmXuIUZKDSUmUtz7Zw1+ILyk/pTIjsTgjvqg0\r
+       J7X4EKMEB7OSCK8dG1CONyWxsiq1KB8mJc3BoiTOq6b1zk9IID2xJDU7NbUgtQgmK8PBoSTB\r
+       2wwyVLAoNT21Ii0zpwQhzcTBCTKcB2j4IZAa3uKCxNzizHSI/ClGRSlx3pkgCQGQREZpHlwv\r
+       LOm8YhQHekWYdz5IFQ8wYcF1vwIazAQ02P6vK8jgkkSElFQDY1vhl5bN59Yc2FPiuKzWch1/\r
+       2wHvTzM073/vdDI6k8Zw4zLrv1sxf6q3qSus/bT9Tu3nnGNWmVEm0d/cDi2Q99hizfops3li\r
+       VYEly52Cg5dkn/Cn/hZQW2K6/3Ewm97O4o3qLBNPSESvUH5m+fF75lM2t8VePOuXPsvcaLT9\r
+       1ZQg+7z0trtVHEosxRmJhlrMRcWJAFGIVhcVAwAA\r
+Cc: notmuch@notmuchmail.org\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+       <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+       <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+       <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Sun, 26 Feb 2012 05:18:20 -0000\r
+\r
+LGTM.  I'm assuming this interacts with the uuencoding filter in the\r
+right order (I don't see how any other order could be correct), but\r
+don't actually know.\r
+\r
+Quoth Michal Sojka on Feb 24 at  8:36 am:\r
+> This fixes a bug that didn't allow to search for non-ASCII words such\r
+> parts. The code here was copied from show_text_part_content(), because\r
+> the show command already does the needed conversion when showing the\r
+> message.\r
+> ---\r
+>  lib/index.cc |   15 +++++++++++++++\r
+>  1 files changed, 15 insertions(+), 0 deletions(-)\r
+> \r
+> diff --git a/lib/index.cc b/lib/index.cc\r
+> index d8f8b2b..e377732 100644\r
+> --- a/lib/index.cc\r
+> +++ b/lib/index.cc\r
+> @@ -315,6 +315,7 @@ _index_mime_part (notmuch_message_t *message,\r
+>      GByteArray *byte_array;\r
+>      GMimeContentDisposition *disposition;\r
+>      char *body;\r
+> +    const char *charset;\r
+>  \r
+>      if (! part) {\r
+>      fprintf (stderr, "Warning: Not indexing empty mime part.\n");\r
+> @@ -390,6 +391,20 @@ _index_mime_part (notmuch_message_t *message,\r
+>      g_mime_stream_filter_add (GMIME_STREAM_FILTER (filter),\r
+>                            discard_uuencode_filter);\r
+>  \r
+> +    charset = g_mime_object_get_content_type_parameter (part, "charset");\r
+> +    if (charset) {\r
+> +    GMimeFilter *charset_filter;\r
+> +    charset_filter = g_mime_filter_charset_new (charset, "UTF-8");\r
+> +    /* This result can be NULL for things like "unknown-8bit".\r
+> +     * Don't set a NULL filter as that makes GMime print\r
+> +     * annoying assertion-failure messages on stderr. */\r
+> +    if (charset_filter) {\r
+> +        g_mime_stream_filter_add (GMIME_STREAM_FILTER (filter),\r
+> +                                  charset_filter);\r
+> +        g_object_unref (charset_filter);\r
+> +    }\r
+> +    }\r
+> +\r
+>      wrapper = g_mime_part_get_content_object (GMIME_PART (part));\r
+>      if (wrapper)\r
+>      g_mime_data_wrapper_write_to_stream (wrapper, filter);\r