Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id C7558431FB6 for ; Sat, 18 Oct 2014 02:12:10 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pn-ld9NsCCBr for ; Sat, 18 Oct 2014 02:12:02 -0700 (PDT) Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 524FF431FBC for ; Sat, 18 Oct 2014 02:12:02 -0700 (PDT) Received: by mail-wi0-f178.google.com with SMTP id h11so3178340wiw.5 for ; Sat, 18 Oct 2014 02:11:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references :user-agent:date:message-id:mime-version:content-type; bh=iCmOE9vH003xLPt36pWG76Ge41o0e9DYGVdJe4BsUhs=; b=TLVRMXYHGSY8QB8o8S6HzF6cQczxCDCkobBAg78RVcxs5M8aajzP29FJ/vBnxezW+t 0WnzNzQ4lhRV8vkkpeiZbibJRC+L/proIxEoGvyiy/2N1UiDUFwvLAfoCPnY/cuKqVE5 4sgC+sseHKzKFHKDM9VuJv4xyYm6wfvWIkgLT6+KiOQ3615oZ+bGo0Gs0yoGoa8OqtvH n1N9WpGDRg2np9mm4G+pTIDf+SUCFrdHPNjk0asGqMDEk645+Pxux66a299u+w47lS9r loHVv5XLOZcf9FfXCFQ0jeVRM/pyy0xJiE4TBoRcfA9ZJ18DiyhbC4HPJ7u3gPrLgf69 1dLw== X-Gm-Message-State: ALoCoQlYAN6uqjDgxI3bzs5JaYHDYe1VrECdrm5+QTcBkmxGM1Qf2Bp/n6yxwKvdCw6R22iHO83R X-Received: by 10.180.14.231 with SMTP id s7mr4975919wic.0.1413623519809; Sat, 18 Oct 2014 02:11:59 -0700 (PDT) Received: from localhost (mobile-internet-5d6ad2-138.dhcp.inet.fi. [93.106.210.138]) by mx.google.com with ESMTPSA id yr9sm4550840wjc.31.2014.10.18.02.11.58 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 18 Oct 2014 02:11:58 -0700 (PDT) From: Jani Nikula To: sshilovsky@gmail.com, Jameson Graef Rollins Subject: Re: Tabulation in multiline headers In-Reply-To: References: <87vbnime8n.fsf@servo.finestructure.net> User-Agent: Notmuch/0.18.1+65~g9f0f30f (http://notmuchmail.org) Emacs/24.3.1 (x86_64-pc-linux-gnu) Date: Sat, 18 Oct 2014 12:11:56 +0300 Message-ID: <87ppdp3foj.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Oct 2014 09:12:10 -0000 On Sat, 18 Oct 2014, Sergei Shilovsky wrote: >> Hi, Sergei. I'm not clear on where exactly you are seeing a problem >> with this tab in the subject line. Is it showing up somewhere you think >> it shouldn't? > > It is shown in e.g. `notmuch show` as well as > 'notmuch_message_get_header(m, "subject")` > >> I'm not sure libnotmuch should be doing any scrubbing of the message >> contents. The emacs UI does seem to replace the tab with a space, >> though. Maybe other MUAs should be doing the same? > > My point is that this tabulation character does not relate to the > contents of the header (this might be arguable though) and libnotmuch > should return the contents, not its representation on file system. This is folding and unfolding of long header fields in action, described in [1]. In short, folding happens by inserting CRLF before any WSP, and unfolding happens by removing any CRLF immediately followed by WSP. The WSP is preserved unchanged through folding and unfolding. The TAB is not part of the multiple line representation, it's part of the unfolded content. If my memory serves me right, many problems lead back to an interpretation of [2] that you could insert extra WSP while folding. Due to this interpretation, many agents replace the WSP following a CRLF with a single space while unfolding. And presumably because of this, buggy folding in a Python email package that replaces WSP by a TAB while folding went unnoticed. This problem, in turn, has been literally spread wide by Mailman 2 through its use of said email package. In practice it follows that a perfectly good message will have folding WSP replaced by TAB when it gets transmitted through Mailman 2. Again, this is all from memory, [citation needed] etc. Notmuch is not free of a history of its own when it comes to header unfolding. For historical reasons, we used two header parsers until recently. One from gmime, and one of our own. After all of the above, it shouldn't surprise the reader that the parsers treated folding WSP differently! Our own parser replaced folding WSP with a single space, while gmime respects the RFC. Starting from 0.18 we only use gmime to parse headers, which means we're at least consistent, but, by the GIGO principle, we may see more folding TABs. I do not think we should workaround header folding problems in the lib, and I'm not sure about the cli either. We should consider replacing TABs with spaces in notmuch-emacs though (I personally use a notmuch-show-markup-headers-hook that does that). HTH, Jani. [1] https://tools.ietf.org/html/rfc5322#section-2.2.3 [2] https://tools.ietf.org/html/rfc822#section-3.1