From d208a67915210693cc3447fbc0dcfde113d0dac8 Mon Sep 17 00:00:00 2001 From: Jani Nikula Date: Sat, 18 Oct 2014 12:11:56 +0300 Subject: [PATCH] Re: Tabulation in multiline headers --- 09/1008e43aef2e96ffdee710fc5c89cf2decb3b0 | 128 ++++++++++++++++++++++ 1 file changed, 128 insertions(+) create mode 100644 09/1008e43aef2e96ffdee710fc5c89cf2decb3b0 diff --git a/09/1008e43aef2e96ffdee710fc5c89cf2decb3b0 b/09/1008e43aef2e96ffdee710fc5c89cf2decb3b0 new file mode 100644 index 000000000..e6841d656 --- /dev/null +++ b/09/1008e43aef2e96ffdee710fc5c89cf2decb3b0 @@ -0,0 +1,128 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id C7558431FB6 + for ; Sat, 18 Oct 2014 02:12:10 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -0.7 +X-Spam-Level: +X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id pn-ld9NsCCBr for ; + Sat, 18 Oct 2014 02:12:02 -0700 (PDT) +Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com + [209.85.212.178]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id 524FF431FBC + for ; Sat, 18 Oct 2014 02:12:02 -0700 (PDT) +Received: by mail-wi0-f178.google.com with SMTP id h11so3178340wiw.5 + for ; Sat, 18 Oct 2014 02:11:59 -0700 (PDT) +X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=1e100.net; s=20130820; + h=x-gm-message-state:from:to:cc:subject:in-reply-to:references + :user-agent:date:message-id:mime-version:content-type; + bh=iCmOE9vH003xLPt36pWG76Ge41o0e9DYGVdJe4BsUhs=; + b=TLVRMXYHGSY8QB8o8S6HzF6cQczxCDCkobBAg78RVcxs5M8aajzP29FJ/vBnxezW+t + 0WnzNzQ4lhRV8vkkpeiZbibJRC+L/proIxEoGvyiy/2N1UiDUFwvLAfoCPnY/cuKqVE5 + 4sgC+sseHKzKFHKDM9VuJv4xyYm6wfvWIkgLT6+KiOQ3615oZ+bGo0Gs0yoGoa8OqtvH + n1N9WpGDRg2np9mm4G+pTIDf+SUCFrdHPNjk0asGqMDEk645+Pxux66a299u+w47lS9r + loHVv5XLOZcf9FfXCFQ0jeVRM/pyy0xJiE4TBoRcfA9ZJ18DiyhbC4HPJ7u3gPrLgf69 + 1dLw== +X-Gm-Message-State: + ALoCoQlYAN6uqjDgxI3bzs5JaYHDYe1VrECdrm5+QTcBkmxGM1Qf2Bp/n6yxwKvdCw6R22iHO83R +X-Received: by 10.180.14.231 with SMTP id s7mr4975919wic.0.1413623519809; + Sat, 18 Oct 2014 02:11:59 -0700 (PDT) +Received: from localhost (mobile-internet-5d6ad2-138.dhcp.inet.fi. + [93.106.210.138]) + by mx.google.com with ESMTPSA id yr9sm4550840wjc.31.2014.10.18.02.11.58 + for + (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); + Sat, 18 Oct 2014 02:11:58 -0700 (PDT) +From: Jani Nikula +To: sshilovsky@gmail.com, Jameson Graef Rollins +Subject: Re: Tabulation in multiline headers +In-Reply-To: + +References: + + <87vbnime8n.fsf@servo.finestructure.net> + +User-Agent: Notmuch/0.18.1+65~g9f0f30f (http://notmuchmail.org) Emacs/24.3.1 + (x86_64-pc-linux-gnu) +Date: Sat, 18 Oct 2014 12:11:56 +0300 +Message-ID: <87ppdp3foj.fsf@nikula.org> +MIME-Version: 1.0 +Content-Type: text/plain +Cc: notmuch@notmuchmail.org +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sat, 18 Oct 2014 09:12:10 -0000 + +On Sat, 18 Oct 2014, Sergei Shilovsky wrote: +>> Hi, Sergei. I'm not clear on where exactly you are seeing a problem +>> with this tab in the subject line. Is it showing up somewhere you think +>> it shouldn't? +> +> It is shown in e.g. `notmuch show` as well as +> 'notmuch_message_get_header(m, "subject")` +> +>> I'm not sure libnotmuch should be doing any scrubbing of the message +>> contents. The emacs UI does seem to replace the tab with a space, +>> though. Maybe other MUAs should be doing the same? +> +> My point is that this tabulation character does not relate to the +> contents of the header (this might be arguable though) and libnotmuch +> should return the contents, not its representation on file system. + +This is folding and unfolding of long header fields in action, described +in [1]. In short, folding happens by inserting CRLF before any WSP, and +unfolding happens by removing any CRLF immediately followed by WSP. The +WSP is preserved unchanged through folding and unfolding. The TAB is not +part of the multiple line representation, it's part of the unfolded +content. + +If my memory serves me right, many problems lead back to an +interpretation of [2] that you could insert extra WSP while folding. Due +to this interpretation, many agents replace the WSP following a CRLF +with a single space while unfolding. And presumably because of this, +buggy folding in a Python email package that replaces WSP by a TAB while +folding went unnoticed. This problem, in turn, has been literally spread +wide by Mailman 2 through its use of said email package. In practice it +follows that a perfectly good message will have folding WSP replaced by +TAB when it gets transmitted through Mailman 2. Again, this is all from +memory, [citation needed] etc. + +Notmuch is not free of a history of its own when it comes to header +unfolding. For historical reasons, we used two header parsers until +recently. One from gmime, and one of our own. After all of the above, it +shouldn't surprise the reader that the parsers treated folding WSP +differently! Our own parser replaced folding WSP with a single space, +while gmime respects the RFC. Starting from 0.18 we only use gmime to +parse headers, which means we're at least consistent, but, by the GIGO +principle, we may see more folding TABs. + +I do not think we should workaround header folding problems in the lib, +and I'm not sure about the cli either. We should consider replacing TABs +with spaces in notmuch-emacs though (I personally use a +notmuch-show-markup-headers-hook that does that). + +HTH, +Jani. + + +[1] https://tools.ietf.org/html/rfc5322#section-2.2.3 +[2] https://tools.ietf.org/html/rfc822#section-3.1 -- 2.26.2