From: Jani Nikula Date: Fri, 9 Aug 2013 18:04:47 +0000 (+0200) Subject: Re: UTF-8 in mail headers (namely FROM) sent by bugzilla X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=4a60e566e9ad4dbb2140dd228050aad5c8addd4b;p=notmuch-archives.git Re: UTF-8 in mail headers (namely FROM) sent by bugzilla --- diff --git a/28/300a40be482142900b932ab08bf54c55642403 b/28/300a40be482142900b932ab08bf54c55642403 new file mode 100644 index 000000000..56b058f76 --- /dev/null +++ b/28/300a40be482142900b932ab08bf54c55642403 @@ -0,0 +1,133 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 01FDC431FAE + for ; Fri, 9 Aug 2013 11:05:02 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -0.7 +X-Spam-Level: +X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id E3VTaFpjBnVX for ; + Fri, 9 Aug 2013 11:04:55 -0700 (PDT) +Received: from mail-wg0-f51.google.com (mail-wg0-f51.google.com + [74.125.82.51]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client + certificate requested) by olra.theworths.org (Postfix) with ESMTPS id + 97C14431FAF for ; Fri, 9 Aug 2013 11:04:55 -0700 + (PDT) +Received: by mail-wg0-f51.google.com with SMTP id a12so3751476wgh.18 + for ; Fri, 09 Aug 2013 11:04:53 -0700 (PDT) +X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=google.com; s=20120113; + h=x-gm-message-state:from:to:cc:subject:in-reply-to:references + :user-agent:date:message-id:mime-version:content-type; + bh=wVJH0hBU0o7KIS4xCuQtlu86kQRdsBtdrVvuKFWPb98=; + b=Uwdn9o1JBvp54yZlu0crPDXm4E/MNRiRYkG+eEdVcN12X1qPIaqZVPARgi5TynkgAy + tIRASlseZI0wdDLBUJMYMwm/Pu9Wd6AwDRnuORLhLqjXUlE1hpdWirfUz38chTL0RhvO + 22ACl6lgzYPPtu2tULJjCzpITFgiQN3++WM3Pg2qRtZNufX8kOyU3dLgD0OhohberzrK + 4/iyip30MkBlF1vPiM8Nvw3OQKxQt94HzZc7ct4yqO3vTTSDtI6jmn0TNuxXYJUYGiKR + WP5Jx+eGPN+qHo0+DvmXQkKwAs7Q4rbqfKBdH3qgh1AAZuWSYE9MuUw8LcY5laD3Dezp + e3xA== +X-Gm-Message-State: + ALoCoQk9LtRNFda4ARjcIU/o5u7/vu/p+jwnRM/b/NNvgttilQ1SP/BEfWlPpO5z3DMsHZ8PjKDE +X-Received: by 10.180.206.97 with SMTP id ln1mr933005wic.39.1376071493192; + Fri, 09 Aug 2013 11:04:53 -0700 (PDT) +Received: from localhost ([2001:4b98:dc0:43:216:3eff:fe1b:25f3]) + by mx.google.com with ESMTPSA id li9sm4109085wic.2.2013.08.09.11.04.51 + for + (version=TLSv1.1 cipher=RC4-SHA bits=128/128); + Fri, 09 Aug 2013 11:04:52 -0700 (PDT) +From: Jani Nikula +To: stedfast@comcast.net, Daniel Kahn Gillmor +Subject: Re: UTF-8 in mail headers (namely FROM) sent by bugzilla +In-Reply-To: <289881190.1977918.1376058260231.JavaMail.root@sz0152a.westchester.pa.mail.comcast.net> +References: <289881190.1977918.1376058260231.JavaMail.root@sz0152a.westchester.pa.mail.comcast.net> +User-Agent: Notmuch/0.15.2+177~gb1ba76c (http://notmuchmail.org) Emacs/23.2.1 + (x86_64-pc-linux-gnu) +Date: Fri, 09 Aug 2013 20:04:47 +0200 +Message-ID: <87bo56viyo.fsf@nikula.org> +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +Cc: Eric Abrahamsen , + Notmuch Mail +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Fri, 09 Aug 2013 18:05:02 -0000 + +On Fri, 09 Aug 2013, stedfast@comcast.net wrote: +> Hi guys, +> +> ( I'm the author of GMime for those that don't know) +> +> I just came across the notmuch thread (with the referenced Subject) +> but unfortunately am not subscribed to the mailing list and so am +> unable to reply to the list (hopefully no one minds me emailing them +> directly!). I wanted to reach out and offer a possible solution to the +> problem being discussed. + +Thanks for your mail; hopefully you don't mind me replying to the list! + +> Passing the GMIME_ENABLE_RFC2047_WORKAROUNDS flag to g_mime_init() +> *should* solve the decoding problem mentioned in the thread. This flag +> should be safe to pass into g_mime_init() without any bad side effects +> and my unit tests do test that code-path. + +Many thanks, this solves my issue with the subject lines. + +This is the quick patch I tried: + +diff --git a/notmuch.c b/notmuch.c +index 78d29a8..7300c21 100644 +--- a/notmuch.c ++++ b/notmuch.c +@@ -264,7 +264,7 @@ main (int argc, char *argv[]) + + local = talloc_new (NULL); + +- g_mime_init (0); ++ g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS); + #if !GLIB_CHECK_VERSION(2, 35, 1) + g_type_init (); + #endif + +We'll need to look into using this in the lib too. + +BR, +Jani. + + +> I took a look at gmime-filter-headers.[c,h] as well and I suspect that +> it was written back when GMime brokenly did not guarantee UTF-8 +> decoded strings from functions like g_mime_message_get_subject() and +> the like. This was fixed a while back. From a quick grep of the +> ChangeLog it looks like this was probably fixed in 2.5.9 or so (but +> possibly as late as 2.6.3 as there were some other charset rfc2047 +> decoder fixes around then). +> +> I know for sure that the 2.4.x series didn't guarantee UTF-8-safe +> strings, but it's been the goal of 2.6.x to make that guarantee (minus +> any bugs that may exist, but if you find any cases of that, let me +> know!) +> +> (Note: raw header values from g_mime_object_get_header() are not +> guaranteed to be UTF-8 but if you call +> g_mime_utils_header_decode_text/phrase() on them, the results are +> guaranteed to be valid UTF-8) +> +> Hope that helps, +> +> Jeff