Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 10BC2431FB6 for ; Wed, 16 Mar 2011 18:44:35 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kTlDYTy33rYk for ; Wed, 16 Mar 2011 18:44:34 -0700 (PDT) Received: from mail.sflc.info (mail.sflc.info [216.27.154.199]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 286DD431FB5 for ; Wed, 16 Mar 2011 18:44:34 -0700 (PDT) Received: from localhost (ool-18bd39a4.dyn.optonline.net [24.189.57.164]) by mail.sflc.info (Postfix) with ESMTPSA id 5DDF5680003 for ; Thu, 17 Mar 2011 01:44:28 +0000 (UTC) From: James Vasile To: notmuch Subject: [PATCH] Remove/replace vertical whitespace in subject header field body. User-Agent: Notmuch/0.5-213-gc96d76a (http://notmuchmail.org) Emacs/23.2.1 (x86_64-pc-linux-gnu) Date: Wed, 16 Mar 2011 21:44:28 -0400 Message-ID: <87ipvifrlv.fsf@softwarefreedom.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 01:44:35 -0000 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable RFC 822 specifies that headers are one-liners of ASCII: > The field-body may be composed of any ASCII characters, except CR or > LF. (While CR and/or LF may be present in the actual text, they are > removed by the action of unfolding the field.) RFC 5335 allows UTF-8 in header field bodies, but as I read the docs, the RFC 822 specification that they end up as one-liners still applies. RFC 5322 describes folding and unfolding as follows: > Each header field is logically a single line of characters comprising > the field name, the colon, and the field body. For convenience > however, and to deal with the 998/78 character limitations per line, > the field body portion of a header field can be split into a > multiple-line representation; this is called "folding". The general > rule is that wherever this specification allows for folding white > space (not simply WSP characters), a CRLF may be inserted before any > WSP. ... > The process of moving from this folded multiple-line representation of > a header field to its single line representation is called > "unfolding". Unfolding is accomplished by simply removing any CRLF > that is immediately followed by WSP. Again, unfolded subjects should be one-liners. An email was sent to me from pingg.com (I think it's a pretentious version of evite) came with a subject of "=3D?utf-8?Q?bring_small_items_for_a_pi=3DC3=3DB1ata=3D21=3D21=3D21=3D21=3D= 0A?=3D", which "notmuch search" displays as "Subject: bring small items for a pi=C3=B1ata!!!!" with a \n at the end. This befuddles the emacs UI ("Error: Unexpected output from notmuch search:"). I've attached an email that reproduces the error. I don't think ending the subject with a utf-8-encoded 0x0A followed by the usual CRLF is RFC-compliant. Still, notmuch should surely follow the deplorable "accept liberally/emit conservatively" doctrine. Here is a patch that trims leading and trailing whitespace from subjects and replaces internal non-space, non-horizontal-tab whitespace with spaces. It fixes the problem described in this message. --- lib/thread.cc | 36 ++++++++++++++++++++++++++++++++---- 1 files changed, 32 insertions(+), 4 deletions(-) diff --git a/lib/thread.cc b/lib/thread.cc index 5190a66..7a816ea 100644 --- a/lib/thread.cc +++ b/lib/thread.cc @@ -266,6 +266,34 @@ _thread_add_message (notmuch_thread_t *thread, } } =20 +/* Remove leading/trailing whitespace and replace internal vertical + * whitespace with spaces. + */ +static char * +rectify_whitespace (char *str) +{ + char *last; + char *curr; + + while (isspace (*str)) + str++; + + if (*str =3D=3D 0) + return str; + + last =3D str + strlen(str) - 1; + while (last > str && isspace (*last)) + last--; + + curr =3D str; + do + if ((*curr >=3D 10) && (*curr <=3D 13)) + *curr =3D 32; //space + while (curr++ < last); + + return str; +} + static void _thread_set_subject_from_message (notmuch_thread_t *thread, notmuch_message_t *message) @@ -282,11 +310,11 @@ _thread_set_subject_from_message (notmuch_thread_t *t= hread, (strncasecmp (subject, "Vs: ", 4) =3D=3D 0) || (strncasecmp (subject, "Sv: ", 4) =3D=3D 0)) { =20 - cleaned_subject =3D talloc_strndup (thread, - subject + 4, - strlen(subject) - 4); + cleaned_subject =3D rectify_whitespace(talloc_strndup (thread, + subject + 4, + strlen(subject) - 4)); } else { - cleaned_subject =3D talloc_strdup (thread, subject); + cleaned_subject =3D rectify_whitespace(talloc_strdup (thread, subjec= t)); } =20 if (thread->subject) --=20 1.7.2.3 --=-=-= Content-Type: application/octet-stream Content-Disposition: attachment; filename=malformed_subject Content-Transfer-Encoding: base64 RGF0ZTogRnJpLCAxMSBNYXIgMjAxMSAxODo0MDowMCArMDAwMApGcm9tOiAicmVkYWN0ZWQiIDxo b3N0QGludml0ZS5waW5nZy5jb20+ClRvOiByZWRhY3RlZEBleGFtcGxlLmNvbQpNZXNzYWdlLUlk OiA8MjAxMTAzMTExODM3NDkuNTI2NzcxLjMxNDUzLjk4NDE4NDFAc2VuZGVyLnBpbmdnLmNvbT4K U3ViamVjdDogPT91dGYtOD9RP2JyaW5nX3NtYWxsX2l0ZW1zX2Zvcl9hX3BpPUMzPUIxYXRhPTIx PTIxPTIxPTIxPTBBPz0KTWltZS1WZXJzaW9uOiAxLjAKQ29udGVudC1UeXBlOiB0ZXh0L3BsYWlu OyBjaGFyc2V0PSJ1cy1hc2NpaSIKQ29udGVudC1UcmFuc2Zlci1FbmNvZGluZzogN2JpdAoKSWdu b3JlIHRoaXMuCg== --=-=-=--