From: W. Trevor King Date: Fri, 7 Nov 2014 08:12:58 +0000 (-0800) Subject: ssoma-mda: Fix RFC-2047 decoding for partially-encoded strings X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=0975a37a18d69e73149c860a8ad6da471787a60a;p=ssoma-mda.git ssoma-mda: Fix RFC-2047 decoding for partially-encoded strings Avoid: TypeError: sequence item 0: expected str instance, bytes found When an RFC-2047-encoded string contains both an unencoded and an encoded section. For example: >>> import email.header >>> email.header.decode_header('Keld =?ISO-8859-1?Q?J=F8rn_Simonsen?=') [(b'Keld ', None), (b'J\xf8rn Simonsen', 'iso-8859-1')] returns the decoded string in bytes but no charset information for the first chunk. I'm not sure what the default charset for header values is, but RFC 2047 sets itself up to deal with non-ASCII header values [1], so I'm guessing it's ASCII ;). [1]: http://tools.ietf.org/html/rfc2047#section-1 --- diff --git a/ssoma-mda b/ssoma-mda index d3b5406..abbe7bc 100755 --- a/ssoma-mda +++ b/ssoma-mda @@ -227,9 +227,13 @@ def _decode_header(string): 'hello' >>> _decode_header(string='=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?=') 'Keld Jørn Simonsen' + >>> _decode_header(string='Keld =?ISO-8859-1?Q?J=F8rn_Simonsen?=') + 'Keld Jørn Simonsen' """ chunks = [] for decoded, charset in _email_header.decode_header(string): + if isinstance(decoded, bytes) and not charset: + charset = 'ASCII' if charset: decoded = str(decoded, charset) chunks.append(decoded)