From: W. Trevor King Date: Sat, 16 Feb 2013 14:06:38 +0000 (-0500) Subject: email: Attempt .as_string() if BytesGenerator.flatten() fails X-Git-Tag: v3.2~5 X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=1c97270ee6d5d0cdb79cd025c17689fd342890bd;p=rss2email.git email: Attempt .as_string() if BytesGenerator.flatten() fails Before converting BytesGenerator in 8a907f9 (email: Fix _flatten() implementation for non-ASCII bodies, 2013-01-23), we used to flatten emails with message.as_string(). BytesGenerator should be the more robust approach, but it is, unfortunately, broken with respect to Unicode payloads [1,2,3,4]. This makes the use-8bit setting pretty useless. Until we find a clean fix for BytesGenerator, fall back on the earlier .as_string() approach where possible. We check the feasibility of the fallback by performing a quasi-round-trip and comparing a message recovered from the byte-encoded form with the original message. If the recovered version does not match the original message, we reraise the BytesGenerator.flatten() error. This fallback should work for any charset who's mapping for ASCII characters is a no-op. One benefit of this altered approach is that we no longer need to encode the payload when we set it up in get_message(). This "Unicode inside--encode on output" approach doesn't smell as much as the old approach ;). The new fallback will probably die screaming if you try and flatten a multipart message, but we don't do that in rss2email. Hopefully, the upstream issues with the email library will be sorted out in the near future... [1]: http://thread.gmane.org/gmane.comp.python.general/725425 [2]: http://bugs.python.org/issue16324 [3]: http://bugs.python.org/issue12553 [4]: http://bugs.python.org/issue12552#msg140294 Signed-off-by: W. Trevor King --- diff --git a/rss2email/email.py b/rss2email/email.py index cbf3d9f..eade609 100644 --- a/rss2email/email.py +++ b/rss2email/email.py @@ -19,6 +19,7 @@ """Email message generation and dispatching """ +import email as _email from email.charset import Charset as _Charset import email.encoders as _email_encoders from email.generator import BytesGenerator as _BytesGenerator @@ -122,7 +123,7 @@ def get_message(sender, recipient, subject, body, content_type, del message['Content-Transfer-Encoding'] charset = _Charset(body_encoding) charset.body_encoding = _email_encoders.encode_7or8bit - message.set_payload(body.encode(body_encoding), charset=charset) + message.set_payload(body, charset=charset) if extra_headers: for key,value in extra_headers.items(): encoding = guess_encoding(value, encodings) @@ -231,8 +232,21 @@ def _flatten(message): """ bytesio = _io.BytesIO() generator = _BytesGenerator(bytesio) # use policies for Python >=3.3 - generator.flatten(message) - return bytesio.getvalue() + try: + generator.flatten(message) + except UnicodeEncodeError as e: + # HACK: work around deficiencies in BytesGenerator + _LOG.warning(e) + b = message.as_string().encode(str(message.get_charset())) + m = _email.message_from_bytes(b) + if not m: + raise + body = str(m.get_payload(decode=True), str(m.get_charsets()[0])) + if (dict(m) == dict(message) and body == message.get_payload()): + return b + raise + else: + return bytesio.getvalue() def sendmail_send(sender, recipient, message, config=None, section='DEFAULT'): if config is None: