From: Jani Nikula Date: Fri, 26 Jul 2013 10:16:21 +0000 (+0200) Subject: Re: UTF-8 in mail headers (namely FROM) sent by bugzilla X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=ed5051314eadbbfa05e6f276cc6da77094de1419;p=notmuch-archives.git Re: UTF-8 in mail headers (namely FROM) sent by bugzilla --- diff --git a/02/f813aa37b62f1682e07404f6b28d94f6435b22 b/02/f813aa37b62f1682e07404f6b28d94f6435b22 new file mode 100644 index 000000000..39dd38d07 --- /dev/null +++ b/02/f813aa37b62f1682e07404f6b28d94f6435b22 @@ -0,0 +1,105 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 1B94E431FAF + for ; Fri, 26 Jul 2013 03:16:38 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -0.7 +X-Spam-Level: +X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id p-HSk9R8iZ0o for ; + Fri, 26 Jul 2013 03:16:30 -0700 (PDT) +Received: from mail-we0-f170.google.com (mail-we0-f170.google.com + [74.125.82.170]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id 9D6B9431FAE + for ; Fri, 26 Jul 2013 03:16:30 -0700 (PDT) +Received: by mail-we0-f170.google.com with SMTP id w60so1708236wes.29 + for ; Fri, 26 Jul 2013 03:16:28 -0700 (PDT) +X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=google.com; s=20120113; + h=from:to:subject:in-reply-to:references:user-agent:date:message-id + :mime-version:content-type:x-gm-message-state; + bh=CLwDSrMRVNzv0didh7VStheMYFLoQtswHRk+bC5WZVc=; + b=CAC60za2kOcTGaAEIkiBRr8d2KN6z1hXJrLxQW5DEL1LnydJhth/SS6Fjjt6RO/8ec + Rj521i6PRvxs3MeKOFFMkMGbhCPsxyaxu+r6GE0NvzsDljjXinJtUHbUqdpjmCbWUQDF + WsZhc8wokuw7sGcCuW9xp0UBYDVtYtSRPou0LKniOFD256B3O4mkFYmbm27/kKrOJ8ja + orVzU56R+gU5VaYeUriaUgeXFv2SxZV0ZmZDOmYSHbg1mEAmG5Df8WlwBk5xjd5hD9Us + 2gKp+/XlP0jLKaE059+SO3FtnhzBHDk9DDoU++Ad39P9MHy6644hJ6SmsP6olbXOQDL/ + /bsQ== +X-Received: by 10.180.38.45 with SMTP id d13mr5117651wik.62.1374833786960; + Fri, 26 Jul 2013 03:16:26 -0700 (PDT) +Received: from localhost ([2001:4b98:dc0:43:216:3eff:fe1b:25f3]) + by mx.google.com with ESMTPSA id u9sm3616142wif.6.2013.07.26.03.16.24 + for + (version=TLSv1.1 cipher=RC4-SHA bits=128/128); + Fri, 26 Jul 2013 03:16:25 -0700 (PDT) +From: Jani Nikula +To: David Bremner , + Franz Fellner , notmuch@notmuchmail.org +Subject: Re: UTF-8 in mail headers (namely FROM) sent by bugzilla +In-Reply-To: <87y58xv71x.fsf@zancas.localnet> +References: <08cb1dcd-c5db-4e33-8b09-7730cb3d59a2@gmail.com> + <871u6psjwr.fsf@ericabrahamsen.net> + <5712cc41-d0ce-4ed3-af1c-37cf639dd9c0@gmail.com> + <87y58xv71x.fsf@zancas.localnet> +User-Agent: Notmuch/0.15.2+177~gb1ba76c (http://notmuchmail.org) Emacs/23.2.1 + (x86_64-pc-linux-gnu) +Date: Fri, 26 Jul 2013 12:16:21 +0200 +Message-ID: <87d2q5wrre.fsf@nikula.org> +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +X-Gm-Message-State: + ALoCoQko29M9Ro43HU2VDrllEyIxRDIGbadpeKM5xJcRvqsopQ3n5ZvT06DzGXNoFV+oWOldg6V2 +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Fri, 26 Jul 2013 10:16:38 -0000 + +On Tue, 23 Jul 2013, David Bremner wrote: +> Franz Fellner writes: +> +>> +>> OK, thx. So every app needs to get patched to display those strings +>> properly? Any chance this could be done directly in libnotmuch? I +>> grepped for "2047" inside te "emacs" subtree, but found nothing (had +>> the hope for a comment for the workaround). Would be interesting to +>> see how this is done, so I can at least try to create a patch (though +>> my ruby is quite basic). +> +> In general notmuch relies on libgmime for rfc2047 parsing. I'm not sure +> of all the details now, but some of the filtering does happen in the +> CLI, not the lib. You could start by looking at +> gmime-filter-headers.[ch] in the top directory. + +I'm experiencing a similar problem with the Subject: headers in bugzilla +mail. Per RFC 2047, + + Ordinary ASCII text and 'encoded-word's may appear together in the + same header field. However, an 'encoded-word' that appears in a + header field defined as '*text' MUST be separated from any adjacent + 'encoded-word' or 'text' by 'linear-white-space'. + +In the problematic mails, the encoded-word begins immediately after +preceding text, i.e. without linear-white-space. Manually adding that +space in the message file makes the subject display as expected. + +The decoding is done in the cli using g_mime_message_get_subject(). I'm +not sure if there's much that can be done about it within notmuch. + +BR, +Jani.