Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 01FDC431FAE for ; Fri, 9 Aug 2013 11:05:02 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id E3VTaFpjBnVX for ; Fri, 9 Aug 2013 11:04:55 -0700 (PDT) Received: from mail-wg0-f51.google.com (mail-wg0-f51.google.com [74.125.82.51]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 97C14431FAF for ; Fri, 9 Aug 2013 11:04:55 -0700 (PDT) Received: by mail-wg0-f51.google.com with SMTP id a12so3751476wgh.18 for ; Fri, 09 Aug 2013 11:04:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references :user-agent:date:message-id:mime-version:content-type; bh=wVJH0hBU0o7KIS4xCuQtlu86kQRdsBtdrVvuKFWPb98=; b=Uwdn9o1JBvp54yZlu0crPDXm4E/MNRiRYkG+eEdVcN12X1qPIaqZVPARgi5TynkgAy tIRASlseZI0wdDLBUJMYMwm/Pu9Wd6AwDRnuORLhLqjXUlE1hpdWirfUz38chTL0RhvO 22ACl6lgzYPPtu2tULJjCzpITFgiQN3++WM3Pg2qRtZNufX8kOyU3dLgD0OhohberzrK 4/iyip30MkBlF1vPiM8Nvw3OQKxQt94HzZc7ct4yqO3vTTSDtI6jmn0TNuxXYJUYGiKR WP5Jx+eGPN+qHo0+DvmXQkKwAs7Q4rbqfKBdH3qgh1AAZuWSYE9MuUw8LcY5laD3Dezp e3xA== X-Gm-Message-State: ALoCoQk9LtRNFda4ARjcIU/o5u7/vu/p+jwnRM/b/NNvgttilQ1SP/BEfWlPpO5z3DMsHZ8PjKDE X-Received: by 10.180.206.97 with SMTP id ln1mr933005wic.39.1376071493192; Fri, 09 Aug 2013 11:04:53 -0700 (PDT) Received: from localhost ([2001:4b98:dc0:43:216:3eff:fe1b:25f3]) by mx.google.com with ESMTPSA id li9sm4109085wic.2.2013.08.09.11.04.51 for (version=TLSv1.1 cipher=RC4-SHA bits=128/128); Fri, 09 Aug 2013 11:04:52 -0700 (PDT) From: Jani Nikula To: stedfast@comcast.net, Daniel Kahn Gillmor Subject: Re: UTF-8 in mail headers (namely FROM) sent by bugzilla In-Reply-To: <289881190.1977918.1376058260231.JavaMail.root@sz0152a.westchester.pa.mail.comcast.net> References: <289881190.1977918.1376058260231.JavaMail.root@sz0152a.westchester.pa.mail.comcast.net> User-Agent: Notmuch/0.15.2+177~gb1ba76c (http://notmuchmail.org) Emacs/23.2.1 (x86_64-pc-linux-gnu) Date: Fri, 09 Aug 2013 20:04:47 +0200 Message-ID: <87bo56viyo.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Eric Abrahamsen , Notmuch Mail X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Aug 2013 18:05:02 -0000 On Fri, 09 Aug 2013, stedfast@comcast.net wrote: > Hi guys, > > ( I'm the author of GMime for those that don't know) > > I just came across the notmuch thread (with the referenced Subject) > but unfortunately am not subscribed to the mailing list and so am > unable to reply to the list (hopefully no one minds me emailing them > directly!). I wanted to reach out and offer a possible solution to the > problem being discussed. Thanks for your mail; hopefully you don't mind me replying to the list! > Passing the GMIME_ENABLE_RFC2047_WORKAROUNDS flag to g_mime_init() > *should* solve the decoding problem mentioned in the thread. This flag > should be safe to pass into g_mime_init() without any bad side effects > and my unit tests do test that code-path. Many thanks, this solves my issue with the subject lines. This is the quick patch I tried: diff --git a/notmuch.c b/notmuch.c index 78d29a8..7300c21 100644 --- a/notmuch.c +++ b/notmuch.c @@ -264,7 +264,7 @@ main (int argc, char *argv[]) local = talloc_new (NULL); - g_mime_init (0); + g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS); #if !GLIB_CHECK_VERSION(2, 35, 1) g_type_init (); #endif We'll need to look into using this in the lib too. BR, Jani. > I took a look at gmime-filter-headers.[c,h] as well and I suspect that > it was written back when GMime brokenly did not guarantee UTF-8 > decoded strings from functions like g_mime_message_get_subject() and > the like. This was fixed a while back. From a quick grep of the > ChangeLog it looks like this was probably fixed in 2.5.9 or so (but > possibly as late as 2.6.3 as there were some other charset rfc2047 > decoder fixes around then). > > I know for sure that the 2.4.x series didn't guarantee UTF-8-safe > strings, but it's been the goal of 2.6.x to make that guarantee (minus > any bugs that may exist, but if you find any cases of that, let me > know!) > > (Note: raw header values from g_mime_object_get_header() are not > guaranteed to be UTF-8 but if you call > g_mime_utils_header_decode_text/phrase() on them, the results are > guaranteed to be valid UTF-8) > > Hope that helps, > > Jeff