Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id F121940D168 for ; Fri, 29 Oct 2010 17:19:19 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -2.89 X-Spam-Level: X-Spam-Status: No, score=-2.89 tagged_above=-999 required=5 tests=[ALL_TRUSTED=-1, BAYES_00=-1.9, T_MIME_NO_TEXT=0.01] autolearn=ham Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MBmZpsnM-QwB; Fri, 29 Oct 2010 17:19:08 -0700 (PDT) Received: from yoom.home.cworth.org (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 63F5A40D150; Fri, 29 Oct 2010 17:19:08 -0700 (PDT) Received: by yoom.home.cworth.org (Postfix, from userid 1000) id 030F6254007; Fri, 29 Oct 2010 17:19:08 -0700 (PDT) From: Carl Worth To: Michal Sojka , Igor Shenderovich , notmuch@notmuchmail.org Subject: Re: utf-8 in author field In-Reply-To: <87vdanrmfo.fsf@steelpick.2x.cz> References: <87vdanrmfo.fsf@steelpick.2x.cz> User-Agent: Notmuch/0.3.1 (http://notmuchmail.org) Emacs/23.2.1 (i486-pc-linux-gnu) Date: Fri, 29 Oct 2010 17:19:07 -0700 Message-ID: <87pquspm6c.fsf@yoom.home.cworth.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Oct 2010 00:19:20 -0000 --=-=-= Content-Transfer-Encoding: quoted-printable On Mon, 17 May 2010 09:56:27 +0200, Michal Sojka wrot= e: > On Fri, 14 May 2010, Igor Shenderovich wrote: > > What should one do to see the true list of authors? >=20 > I encounter the same when headers are not encoded properly according to > RFC 2047. I commonly see the violation of section 5, paragraph (3), > sentence "An 'encoded-word' MUST NOT appear within a 'quoted-string'". > That is when the encoded word is enclosed in double quotes. I guess, the > "problem" is not only notmuch related, but all users of gmime library > must be affected. Thanks for that explanation, Michal. Igor, does that explanation seem correct for the situation you have? > I use the following patch for notmuch to sanitize headers from a popular > mailing list server in Czech republic: Obviously that patch is a little too specific to be considered for upstream notmuch. But I'm curious to know if there's anything general that we could do in notmuch? My guess is that the best we could do is to come up with some heuristics for recognizing a non-RFC-compliant header here and munging it. And the heuristics could then fail with messages that were RFC-compliant and intentionally including a string of characters that would match the heuristic, (which would presumably be rare, but not impossible---so perhaps we would then need some configuration). Anyway, if one of you could send an example of a misbehaving message, I might like to look at it and perhaps add it to the test suite to see if there's anything we can safely do about it. =2DCarl =2D-=20 carl.d.worth@intel.com --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iD8DBQFMy2R76JDdNq8qSWgRAu6cAJ9uX4yItqF4VXLnWYtPQU1pL0HuTgCcDPYf Gn78bXKCRUkj2XFjq/m7AO0= =ex+u -----END PGP SIGNATURE----- --=-=-=--