W. Trevor King [Mon, 10 Jun 2013 20:25:27 +0000 (16:25 -0400)]
test/test.py: Update clean_result() to normalize user agents
Catch up with the USER_AGENT changes from
3f9adb5 (feed: Add the
digest setting for multi-entry email, 2013-04-13).
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sun, 9 Jun 2013 21:26:31 +0000 (17:26 -0400)]
CHANGELOG: Document HTML syntax fix
Documenting:
4aa7f1d Fixed syntactical error when generating HTML mails
The commit by Dennis Keitzel fixed a typo in the HTML which dated back
to
00e2eecc (Spread cmd_run() logic out into Feed methods,
2012-10-04).
Signed-off-by: W. Trevor King <wking@tremily.us>
Dennis Keitzel [Sat, 8 Jun 2013 09:32:55 +0000 (11:32 +0200)]
Fixed syntactical error when generating HTML mails
Signed-off-by: Dennis Keitzel <github@pinshot.net>
W. Trevor King [Wed, 5 Jun 2013 22:13:14 +0000 (18:13 -0400)]
Bump to version 3.5
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 5 Jun 2013 22:09:52 +0000 (18:09 -0400)]
CHANGELOG: Mention the digest addition
Documenting:
92c0e76 Merge branch 'digest'
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 5 Jun 2013 22:08:47 +0000 (18:08 -0400)]
Merge branch 'digest'
* digest:
feed: Add the digest-post-process setting
feed: Add the digest setting for multi-entry email
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 14 May 2013 11:57:50 +0000 (07:57 -0400)]
command: Add newlines to OPML export
No semantic change, but it makes the exported data easier for humans
to read.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 14 May 2013 11:37:33 +0000 (07:37 -0400)]
command: Use feed names in OPML 'text' attributes
Instead of writing the URL as the 'text' attribute and ignoring it on
read, we now use the attribute to store the feed name. This avoids
auto-generated feed names on import. From the OPML 2.0 spec [1]:
Subscription lists
...
Required attributes: type, text, xmlUrl. For outline elements whose
type is rss, the text attribute should initially be the top-level
title element in the feed being pointed to, however since it is
user-editable, processors should not depend on it always containing
the title of the feed. xmlUrl is the http address of the feed.
We are not following the 'should' recommendation, but since we have
user-generated titles, I believe that the new usage is appropriate.
It's certainly closer to spec than storing a URL in 'text' :p.
[1]: http://dev.opml.org/spec2.html
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 14 May 2013 11:03:32 +0000 (07:03 -0400)]
command: Fix opmlexport crash due to orphaned feed data
When you remove a feed from your config file by hand, you might leave
the dynamic 'seen' data in the JSON data file by accident. If you
have such orphan data, the feed is loaded by Feeds._load_feeds() with
the default configuration (since you removed the config file entry).
This can lead to opmlexport errors like:
Traceback (most recent call last):
File "./r2e", line 5, in <module>
rss2email.main.run()
File "/.../rss2email/rss2email/main.py", line 163, in run
args.func(feeds=feeds, args=args)
File "/.../rss2email/rss2email/command.py", line 157, in opmlexport
url = _saxutils.escape(feed.url)
File "/usr/lib64/python3.2/xml/sax/saxutils.py", line 34, in escape
data = data.replace("&", "&")
AttributeError: 'NoneType' object has no attribute 'replace'
because the feeds lack the per-feed 'url' setting that had been
defined in the config file. With this commit, opmlexport drops these
URL-less feeds, instead of choking to death trying to format them ;).
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 14 May 2013 10:18:22 +0000 (06:18 -0400)]
README: Link to Fedora package
Also remove 'Linux' from distribution names. The goal is to point
folks using $DISTRO to their package, not to give details about the
technical underpinnings of a given distribution.
I've sorted the distributions by packaging format: deb, rpm, ebuild,
and Makefile.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 14 May 2013 08:38:07 +0000 (04:38 -0400)]
config: Replace Config._setup() with Config.setup_html2text()
Since
7bbbf62 (Setup html2text in Config._setup(), 2012-10-04), the
html2text configuration options have only been referenced from
Config._setup(), and that method was never called. With this commit,
we rename the method to setup_html2text(), and add a new
Feed._html2text() which invokes the setup before calling
html2text.html2text().
This caused a fair amount of churn in the expected test results, as
previously ignored default values for html2text kicked in. I also
added a test exercising the non-default values (allthingsrss/3), and
it looks like everything works as expected.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 14 May 2013 07:47:22 +0000 (03:47 -0400)]
Bump to version 3.4
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 11 May 2013 10:52:12 +0000 (06:52 -0400)]
email: Pass 'config' and 'section' through from send() to *_send()
Otherwise only the default configuration will be used, which is almost
certainly not what the user wants.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 12:38:01 +0000 (08:38 -0400)]
CHANGELOG: Update with summary of IMAP delivery addition
Documenting:
a7f2222 Merge branch 'imap'
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 12:33:09 +0000 (08:33 -0400)]
Merge branch 'imap'
* imap:
email: small fixes for using imap as a backend
email: Stub out send_imap()
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 12:26:01 +0000 (08:26 -0400)]
email: Remove debugging logging from _decode_header()
This was accidentally committed in
e08e198 (email: Decode headers when
checking .as_string() flatten fallback, 2013-02-17).
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 09:36:25 +0000 (05:36 -0400)]
feed: Add the digest-post-process setting
For users that want to manipulate the multi-entry message. For
example, if the stock `digest for <feed.name>` title doesn't cut it
for you, you can now use a post-processing hook to set the title
however you like.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 13 Apr 2013 22:05:03 +0000 (18:05 -0400)]
feed: Add the digest setting for multi-entry email
For high-volume feeds, some users want to receive a single email per
Feed.run() instead of a separate email for each new entry in the feed.
If you enable the new digest setting, the per-entry messages are
packed into a single multipart/digest message instead of being mailed
individually. The MIME details for digests are spelled out in RFC
2046 [1].
Peripheral changes:
* Added rss2email.feed._USER_AGENT, to get version information into
the User-Agent message headers and to avoid repeating myself.
* Normalize multipart MIME boundaries for easier testing of
multipart/digest messages.
[1]: http://tools.ietf.org/html/rfc2046#section-5.1.5
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 09:16:02 +0000 (05:16 -0400)]
Run update-copyright.py
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 09:14:21 +0000 (05:14 -0400)]
.update-copyright.conf: Use aliases to remove Aaron Swartz's email
Not much use in sending email to the deceased :(.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 08:58:28 +0000 (04:58 -0400)]
CHANGELOG: Update with summary of post-process addition
Documenting:
3a331d6 Merge branch 'post-process'
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 08:58:06 +0000 (04:58 -0400)]
Merge branch 'post-process'
* post-process:
rss2email/post_process/downcase.py: Move my test hook into Arun's directory
post_process: add documentation and a prettify example
Add configurable post-process hooks
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 08:39:33 +0000 (04:39 -0400)]
rss2email/post_process/downcase.py: Move my test hook into Arun's directory
All the built-in hooks should live in the same sub-package. The
`post_process` name Arun used is more descriptive than my `hook`, so
move my downcase code there.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 19 Apr 2013 11:38:28 +0000 (07:38 -0400)]
README: Update the link to the NetBSD package
Although they're still packaging rss2email-2.71nb2. I'll see if I can
dig up a maintainer to ping about that.
Signed-off-by: W. Trevor King <wking@tremily.us>
Arun Persaud [Mon, 15 Apr 2013 21:49:52 +0000 (14:49 -0700)]
post_process: add documentation and a prettify example
* also mention it in the README file.
* package the filter via setup.py
Signed-off-by: Arun Persaud <apersaud@lbl.gov>
W. Trevor King [Mon, 15 Apr 2013 23:20:51 +0000 (19:20 -0400)]
r2e.1: Properly escape an ellipsis in the sample configuration
Following the example from nroff.1, which uses:
.RI [ file\~ .\|.\|.]
For reasons that I haven't bothered to track down, the ellipsis isn't
rendered correctly when it occurs at the beginning of a line (even
with the `\|` separators). After adding some leading whitespace,
everything seems to be working fine.
Reported-by: Matěj Cepl <mcepl@redhat.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 13 Apr 2013 23:30:28 +0000 (19:30 -0400)]
Bump to version 3.3
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 13 Apr 2013 23:27:30 +0000 (19:27 -0400)]
CHANGELOG: Update with summary of <table> removal
Documenting:
d293ab8 feed: Remove <table> elements from HTML mail
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 22 Jan 2013 16:59:36 +0000 (11:59 -0500)]
Add configurable post-process hooks
On Mon, Jan 21, 2013 at 11:16:58PM -0800, Arun Persaud wrote:
> but I was wondering if there is any chance to add some hooks, so
> that the user can modify the feed before it gets send, something
> that takes the url, uid, and other interesting information and
> returns the body of the feed that should get emailed.
This is not quite what he asked for (e.g., I don't pass the URL
explicitly, the hook should return the full message instead of just
payload, ...), but I think it get's the job done.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 9 Apr 2013 19:30:11 +0000 (15:30 -0400)]
feed: Remove <table> elements from HTML mail
These were not semantically correct ;). Based on a patch by Rui Carmo
[1].
[1]: https://github.com/rcarmo/rss2email/commit/
2a015bce9d701035b9af874bd56c46f92382e668
Based-on-patch-by: Rui Carmo <rui.carmo@gmail.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 5 Apr 2013 23:12:00 +0000 (19:12 -0400)]
README: Bump example snapshot versions from 2.71 to 3.2
Now that we have releases in the 3.x line, we should be pointing users
in that direction. These version numbers should probably be bumped
with each release :(.
Signed-off-by: W. Trevor King <wking@tremily.us>
Arun Persaud [Fri, 5 Apr 2013 19:04:35 +0000 (12:04 -0700)]
email: small fixes for using imap as a backend
* fixed two typos in "def send"
* removed some unecessary calls to imap.connect and
imap.close (which seems to be only needed in case you open
a mailbox, which we don't)
Signed-off-by: Arun Persaud <apersaud@lbl.gov>
W. Trevor King [Thu, 28 Mar 2013 10:51:31 +0000 (06:51 -0400)]
email: Stub out send_imap()
Arun Persaud suggested IMAP as an additional email delivery mechanism.
The benefit of using IMAP over SMTP is that you can set the target
mailbox directly (instead of filtering the incoming mail with procmail
or a similar external tool). This commit restructures the 'send'
configuration to support IMAP output with a configurable mailbox.
That means you can do something like:
[DEFAULT]
email-protocol: imap
imap-auth: True
imap-username: myname
imap-password: mypass
imap-server: imap.yourisp.net
imap-port: 993
imap-ssl: True
[feed.rss2email]
url = http://www.allthingsrss.com/rss2email/feed/
imap-mailbox = rss2email
[feed.xkcd]
url = http://xkcd.com/atom.xml
imap-mailbox = xkcd
For non-IMAP users, note that the boolean `use-smtp` configuration
variable is gone, replaced by the more flexible `email-protocol`.
You'll want to replace:
use-smtp = False
with:
email-protocol = sendmail
and replace:
use-smtp = True
with:
email-protocol = smtp
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 28 Mar 2013 10:58:40 +0000 (06:58 -0400)]
CHANGELOG: Update with summaries of recent changes
This documents the following changes:
5fd97a2 Add George Saunders to __contributors__
bd7b7ca error: Don't explicitly store server in SMTPAuthenticationError
3f5df62 feed: Add herror to _SOCKET_ERRORS and remove reason handing
c39625e error: Fix ProcessingError message and logging
a3719f8 feed: Catch parsing errors during html2text
a88738f error: Fix super calls for SMTPAuthenticationError, etc.
80a8edf email: Change stray SMTP_SERVER to server
a66dd58 email: Remove explicit ehlo() call
c9f5681 feed: Streamline rel-via title extraction in
_process_entry_content()
aa8675d feed: Drop Google Reader rel-via manipulation
a226ef6 error: Fix inheritance typos for HTTPError (ProcessingError ->
FeedError)
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 28 Mar 2013 10:53:22 +0000 (06:53 -0400)]
Add George Saunders to __contributors__
For his
80a8edf (email: Change stray SMTP_SERVER to server,
2013-03-18).
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 28 Mar 2013 01:17:03 +0000 (21:17 -0400)]
error: Don't explicitly store server in SMTPAuthenticationError
It's already being stored by SMTPConnectionError, which is called via
super().
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 21 Mar 2013 15:07:57 +0000 (11:07 -0400)]
feed: Add herror to _SOCKET_ERRORS and remove reason handing
We don't log the reason, so trying to extract it just gives room for
errors to creep in. Luckily, we'll be able to drop the whole
_SOCKET_ERRORS thing when we move to Python >= 3.3, because following
PEP 3151 the socket errors became subclasses of OSError.
Reported-by: Matt Bordignon <matthew@bordignons.net>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 20 Mar 2013 09:40:32 +0000 (05:40 -0400)]
error: Fix ProcessingError message and logging
We can't check if message is None if message wasn't an argument to
__init__(). Also:
* import sys for sys.version
* explicitly format strings passed to _LOG.warning(), otherwise you'll
get the following:
>>> LOG.warning('abc', 'def')
Traceback (most recent call last):
...
TypeError: not all arguments converted during string formatting
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 20 Mar 2013 09:27:03 +0000 (05:27 -0400)]
feed: Catch parsing errors during html2text
This avoids crashing with:
Traceback (most recent call last):
...
File ".../rss2email/feed.py", line 732, in _process_entry_content
lines = [_html2text.html2text(content['value'])]
...
File "/usr/lib/python3.2/html/parser.py", line 149, in error
raise HTMLParseError(message, self.getpos())
html.parser.HTMLParseError: EOF in middle of construct, at line 1, column 262
The troublesome feed was:
$ wget -S http://www.cell.com/rssFeed/biophysj/rss.NewIssueAndArticles.xml
--2013-03-20 05:22:08-- http://www.cell.com/rssFeed/biophysj/rss.NewIssueAndArticles.xml
Resolving www.cell.com... 145.36.42.28
Connecting to www.cell.com|145.36.42.28|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Wed, 20 Mar 2013 09:23:19 GMT
Server: IBM_HTTP_Server
Last-Modified: Tue, 19 Mar 2013 22:00:04 GMT
Accept-Ranges: bytes
Content-Length: 15362
Vary: Accept-Encoding
Keep-Alive: timeout=10, max=100
Connection: Keep-Alive
Content-Type: text/xml
Length: 15362 (15K) [text/xml]
Saving to: ‘rss.NewIssueAndArticles.xml’
100%[======================================>] 15,362 94.1KB/s in 0.2s
2013-03-20 05:22:08 (94.1 KB/s) - ‘rss.NewIssueAndArticles.xml’ saved [15362/15362]
which contained the poorly split summary:
<item>
<title>Synergistic Insertion of Antimicrobial Magainin-Family Peptides in Membranes Depends on the Lipid Spontaneous Curvature</title>
<link>http://www.cell.com/biophysj/abstract/S0006-3495(13)00153-7</link>
<description>Erik Strandberg, Jonathan Zerweck, Parvesh Wadhwani, Anne S. Ulrich. PGLa and magainin 2 (MAG2) are amphiphilic antimicrobial peptides from frog skin with known synergistic activity. The orientation of the two helices in membranes was studied using solid-state <sup....</description>
<pubDate>Tue, 19 Mar 2013 00:00:00 GMT</pubDate>
<guid>http://www.cell.com/biophysj/abstract/S0006-3495(13)00153-7</guid>
<dc:date>2013-03-19T00:00:00Z</dc:date>
</item>
The '<sup....' in the description broke the parser.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 19 Mar 2013 23:58:09 +0000 (19:58 -0400)]
error: Fix super calls for SMTPAuthenticationError, etc.
Fix copy/paste errors where super() calls used the wrong class name
(it should match the class in which the method is defined) for:
* SMTPAuthenticationError.__init__
* ProcessingError.__init__
* OPMLReadError.__init__
Reported-by: Matt Bordignon <matthew@bordignons.net>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 19 Mar 2013 00:31:46 +0000 (20:31 -0400)]
Merge remote-tracking branch 'alienacorn/master'
* alienacorn/master:
email: Change stray SMTP_SERVER to server
Signed-off-by: W. Trevor King <wking@tremily.us>
George Saunders [Mon, 18 Mar 2013 23:20:09 +0000 (23:20 +0000)]
email: Change stray SMTP_SERVER to server
This fixes the error below that occurred upon sending a message by SMTP.
NameError: global name 'SMTP_SERVER' is not defined.
Signed-off-by: George Saunders <georgesaunders@gmail.com>
W. Trevor King [Mon, 18 Mar 2013 10:47:46 +0000 (06:47 -0400)]
email: Remove explicit ehlo() call
It won't work before we've connected to the server:
Traceback (most recent call last):
...
File ".\rss2email\email.py", line 145, in smtp_send
smtp.ehlo()
...
smtplib.SMTPServerDisconnected: please run connect() first
That makes sense ;). If we want to call ehlo(), we should certainly
do it after the .connect() call succeeds. Looking at the docs [1], I
don't think we need to call it at all:
Unless you wish to use has_extn() before sending mail, it should not
be necessary to call this method explicitly. It will be implicitly
called by sendmail() when necessary.
We don't use has_extn(), and the EHLO should happen implicitly in
starttls [2]:
If there has been no previous EHLO or HELO command this session,
this method tries ESMTP EHLO first.
and send_message [3]:
This is a convenience method for calling sendmail()...
via sendmail [4]:
If there has been no previous EHLO or HELO command this session,
this method tries ESMTP EHLO first.
[1]: http://docs.python.org/3/library/smtplib.html#smtplib.SMTP.ehlo
[2]: http://docs.python.org/3/library/smtplib.html#smtplib.SMTP.starttls
[3]: http://docs.python.org/3/library/smtplib.html#smtplib.SMTP.send_message
[4]: http://docs.python.org/3/library/smtplib.html#smtplib.SMTP.sendmail
Reported-by: Matt Bordignon <matthew@bordignons.net>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 15 Mar 2013 11:44:02 +0000 (07:44 -0400)]
feed: Streamline rel-via title extraction in _process_entry_content()
No functional change here, this just tightens up the Python code for
clarity.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 15 Mar 2013 11:37:25 +0000 (07:37 -0400)]
feed: Drop Google Reader rel-via manipulation
Google Reader will be retired on 2013-07-01, so we should be able to
drop the special rel-via handling it got starting with
eb04d97 (Bump
to version 2.67, 2010-09-21).
[1]: http://googlereader.blogspot.com/2013/03/powering-down-google-reader.html
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 15 Mar 2013 10:43:29 +0000 (06:43 -0400)]
error: Fix inheritance typos for HTTPError (ProcessingError -> FeedError)
Avoid:
Traceback (most recent call last):
...
File ".../rss2email/feed.py", line 338, in _check_for_errors
raise _error.HTTPError(status=status, feed=self)
File ".../rss2email/error.py", line 166, in __init__
super(FeedError, self).__init__(feed=feed, message=message)
TypeError: __init__() got an unexpected keyword argument 'feed'
HTTPErrors occur when we're fetching a feed, so we don't have the
parsed feed instance required by ProcessingErrors. We should use the
more general FeedError instead, and use the HTTPError class name in
the super() call.
Both of changes come from sloppy copy-paste errors while stubbing out
lots of similar error classes during the huge
066602ef (rss2email:
split massive package into modules, 2012-11-13).
Reported-by: Matěj Cepl <mcepl@redhat.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 13 Mar 2013 13:50:37 +0000 (09:50 -0400)]
Bump to version 3.2
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 13 Mar 2013 13:46:38 +0000 (09:46 -0400)]
CHANGELOG: Update with summaries of recent changes
This documents the following changes:
*
f01eac2 (email: Make path to sendmail configurable, 2013-02-15)
*
1c97270 (email: Attempt .as_string() if BytesGenerator.flatten()
fails, 2013-02-16)
*
e08e198 (email: Decode headers when checking .as_string() flatten
fallback, 2013-02-17)
*
3adef87 (config: Use extended interpolation in Config, 2013-02-21)
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 2 Mar 2013 19:40:25 +0000 (14:40 -0500)]
README: Link to the new openSUSE package
On Sat, Mar 02, 2013 at 08:54:11AM -0800, Arun Persaud wrote:
> I managed to get rss2email 3.1 packaged for opensuse. It's available at
>
> http://download.opensuse.org/repositories/server:/mail/
>
> for 12.2, 12.3 and Factory in case you want to mention it on the project
> page. It doesn't build on older distros, mostly due to missing
> dependencies, but on 12.1 the build fails with
>
> ...
>
> The package is developed at:
>
> https://build.opensuse.org/package/show?package=rss2email&project=server%3Amail
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 21 Feb 2013 11:11:19 +0000 (06:11 -0500)]
config: Use extended interpolation in Config
This avoids triggering accidental interpolation errors when your URL
contains percent signs (e.g. %2F). Curly braces, on the other hand,
will never appear in an encoded URL. From RFC 1738:
Unsafe:
... Other characters are unsafe because gateways and other transport
agents are known to sometimes modify such characters. These
characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".
All unsafe characters must always be encoded within a URL.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sun, 17 Feb 2013 15:49:41 +0000 (10:49 -0500)]
email: Decode headers when checking .as_string() flatten fallback
A naive dict-comparison fails if any header fields are encoded
following RFC 2047. For example,
'=?iso-8859-1?q?this=20is=20some=20text?='
will not compare equal to an email.header.Header instance. By
decoding everything to Unicode strings before comparing the header
fields, we can see if the underlying data matches.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 16 Feb 2013 14:06:38 +0000 (09:06 -0500)]
email: Attempt .as_string() if BytesGenerator.flatten() fails
Before converting BytesGenerator in
8a907f9 (email: Fix _flatten()
implementation for non-ASCII bodies, 2013-01-23), we used to flatten
emails with message.as_string(). BytesGenerator should be the more
robust approach, but it is, unfortunately, broken with respect to
Unicode payloads [1,2,3,4]. This makes the use-8bit setting pretty
useless.
Until we find a clean fix for BytesGenerator, fall back on the earlier
.as_string() approach where possible. We check the feasibility of the
fallback by performing a quasi-round-trip and comparing a message
recovered from the byte-encoded form with the original message. If
the recovered version does not match the original message, we reraise
the BytesGenerator.flatten() error. This fallback should work for
any charset who's mapping for ASCII characters is a no-op.
One benefit of this altered approach is that we no longer need to
encode the payload when we set it up in get_message(). This "Unicode
inside--encode on output" approach doesn't smell as much as the old
approach ;).
The new fallback will probably die screaming if you try and flatten a
multipart message, but we don't do that in rss2email. Hopefully, the
upstream issues with the email library will be sorted out in the near
future...
[1]: http://thread.gmane.org/gmane.comp.python.general/725425
[2]: http://bugs.python.org/issue16324
[3]: http://bugs.python.org/issue12553
[4]: http://bugs.python.org/issue12552#msg140294
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 16 Feb 2013 13:22:10 +0000 (08:22 -0500)]
email: Fix typo '\\n' -> '\n' in _flatten docstring.
I seem to have forgotten that the docstring is raw (`r"""`) when I
wrote the original tests.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 15 Feb 2013 14:16:39 +0000 (09:16 -0500)]
email: Make path to sendmail configurable
For example, Azer Koçulu has an rss2email fork that uses msmtp [1].
[1]: https://github.com/azer/rss2email
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 15 Feb 2013 13:13:34 +0000 (08:13 -0500)]
CHANGELOG: Update after the release of 3.0 and 3.1
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 14 Feb 2013 13:34:03 +0000 (08:34 -0500)]
Bump to version 3.1
Changes since 3.0:
* Import __url__, __author__, and __email__ in rss2email.error, which
fixes bugs formatting a number of errors.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 14 Feb 2013 13:28:35 +0000 (08:28 -0500)]
error: Import __*__ metadata (URL, author, email)
These are used to format some of the log messages.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 14 Feb 2013 13:24:08 +0000 (08:24 -0500)]
setup.py: Use __url__ and __author__ from rss2email/__init__.py
Instead of duplicating those values locally. This gives us one less
place to forget to update the next time we change the website or
maintainer.
Also, update __url__ to point to my GitHub repository and shift the
email address to rss2email.__email__.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 13 Feb 2013 14:32:40 +0000 (09:32 -0500)]
Bump to version 3.0
Changes since 2.71:
* State storage split into a static configuration file (usually
`~/.config/rss2email.cfg`) and a dynamic JSON data file (usually
`~/.local/share/rss2email.json`).
* The static configuration file is parsed with Python's ConfigParser
class, which allows for default settings that can be overridden on a
global or per-feed basis. You'll have to translate your old config
to the new format by hand when you upgrade.
* Emailed messages now have Message-IDs.
* Feeds can be indexed by name as well as index (e.g.
`r2e run my-feed`).
* Restructured as a package with submodules instead of a single
module. This makes dependencies between various portions of
rss2email more explicit.
* Converted to Python >=3.2, for more consistent Unicode handling,
exception chaining, and argparse (although argparse is also in 2.7).
* Packaged with setup.py and distutils, in case you want to install
rss2email instead of running it from a Git checkout or unpacked
tarball.
* Added a test suite (run with `./test/test.py`).
* Added a man page, based on the version in the Debian package.
* Require Signed-off-by lines in new commit messages, following the
Linux and Git projects.
* Assorted cleanups and bug fixes.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 13 Feb 2013 14:12:05 +0000 (09:12 -0500)]
command: In run(), save feeds even after errors
It's annoying to have a few feeds processed successfully and then have
one feed with a configuration error take down the process without
saving. With this commit, we always safe the feeds, regardless of any
error. We also catch and log any RSS2EmailError, not just the
NoToEmailAddress and ProcessingErrors we caught earlier.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 24 Jan 2013 22:51:13 +0000 (17:51 -0500)]
config: Fix 'significan' -> 'significant' typo in a comment
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 24 Jan 2013 05:30:04 +0000 (00:30 -0500)]
email: Encode the body when we might use 8bit encoding
R. David Murray writes [1]:
> In 2.x that will work, and will give you the 8bit CTE at need, as
> long as you pass encoded text to MIMEText (as opposed to
> unicode...which it doesn't really handle correctly if I recall
> right).
See also, [2].
[1]: http://bugs.python.org/issue12552
email.MIMEText overide BASE64 for utf8 charset
[2]: http://bugs.python.org/issue12553
Add support for using a default CTE of '8bit' to MIMEText
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 24 Jan 2013 04:41:50 +0000 (23:41 -0500)]
email: Fix _flatten() implementation for non-ASCII bodies
The email header should be flattened to ASCII with funky encoded
headers [1], but the body may be encoded in a non-ASCII-compatible
charset (e.g. UTF-16-LE). The old _flatten() implementation used the
body charset to encode the entire message, which could garble the
header. This patch uses BytesGenerator, which takes advantage of
email.charset.Charset's separate fields for the header encoding and
body encoding.
[1]: http://docs.python.org/3/library/email.header.html
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 24 Jan 2013 04:12:01 +0000 (23:12 -0500)]
email: Add a failing UTF-16 _flatten example
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 24 Jan 2013 03:59:55 +0000 (22:59 -0500)]
email: Factor message-to-bytes formatting out into _flatten()
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 24 Jan 2013 04:39:44 +0000 (23:39 -0500)]
email: Use Charsets to set the Content-Transfer-Encoding
This ensures that payload encoding/decoding happens appropriately, and
allows 7-bit-clean data to be sent with a 7bit CTE, even when the
use-8bit setting is on.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 24 Jan 2013 04:02:41 +0000 (23:02 -0500)]
email: Alphabetize imports (swap email.mime and email.header)
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 24 Jan 2013 01:17:00 +0000 (20:17 -0500)]
main: Show traceback when we're extra verbose
Adding --verbose (or -V) flags moves the logger from ERROR to WARNING
(-V), INFO (-VV), and DEBUG (-VVV). Additional increments were
ignored, but I don't like always masking tracebacks. This patch sets
an additional verbosity level (-VVVV) which logs at DEBUG and
additionally prints exception tracebacks instead of hiding them.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 24 Jan 2013 01:03:51 +0000 (20:03 -0500)]
feed: Convert missing/extra key errors to InvalidFeedConfig
This way we get the message and not a full traceback, to avoid scaring
users who aren't familiar with Python tracebacks. Theres not much
information to go on in the new message, but if you crank up the
verbosity, you get:
$ PYTHONPATH=. ./r2e -c conf -d data -VVV list
load feed configuration from ['conf']
loaded configuration from ['conf']
load feed data from data
extra configuration key: use_8bit
which seems good enough for me.
Reported-by: Dmitry Bogatov <KAction@gnu.org>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 23 Jan 2013 23:00:57 +0000 (18:00 -0500)]
email: When setting an 8bit CTE, remove the old header
From the docs [1]:
Note that this does not overwrite or delete any existing header with
the same name. If you want to ensure that the new header is the only
one present in the message with field name name, delete the field
first, e.g.:
del msg['subject']
msg['subject'] = 'Python roolz!'
[1]: http://docs.python.org/3/library/email.message.html#email.message.Message.__setitem__
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 23 Jan 2013 23:00:02 +0000 (18:00 -0500)]
test/bbc-chinese: Add tests for use-8bit
The BBC's Chinese feed should have a few non-ASCII characters in it
;).
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 23 Jan 2013 22:54:34 +0000 (17:54 -0500)]
config: Rename use_8bit to use-8bit for uniformity
We use hyphens in all the other config settings.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 23 Jan 2013 22:51:50 +0000 (17:51 -0500)]
feed: Pass config and section arguments to get_message()
Otherwise it will always use the default config.
Also add section fallback code to get_message in case the
feed-specific section is not in the config file. This is useful for
testing, although in production every feed should have a section to
hold it's URL.
Signed-off-by: W. Trevor King <wking@tremily.us>
Dmitry Bogatov [Wed, 23 Jan 2013 20:36:02 +0000 (00:36 +0400)]
Add use of `sender' parameter in `sendmail_send'.
Signed-off-by: Dmitry Bogatov <KAction@gnu.org>
Dmitry Bogatov [Wed, 23 Jan 2013 20:35:04 +0000 (00:35 +0400)]
Add 8bit Content-Transfer-Encoding support.
Signed-off-by: Dmitry Bogatov <KAction@gnu.org>
W. Trevor King [Wed, 23 Jan 2013 14:13:43 +0000 (09:13 -0500)]
feeds: Raise an RSS2EmailError on invalid Feeds.index() arguments
Don't confuse non-Python folks by giving a traceback for this usage
error.
Reported-by: Dmitry Bogatov <KAction@gnu.org>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 19 Jan 2013 18:22:49 +0000 (13:22 -0500)]
email: Don't assume `extra_headers` has content in get_message()
The default is None, so we should at least handle that case
gracefully.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 19 Jan 2013 00:13:15 +0000 (19:13 -0500)]
feed: Remove the for loop variable `e` for a clean namespace
Otherwise:
>>> import rss2email.feed
>>> print([x for x in dir(rss2email) if not x.startswith('_')])
['Feed', 'e']
which might confuse people ;).
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 18 Jan 2013 21:46:23 +0000 (16:46 -0500)]
main: Catch command-less case for Python 3.3
In Python 3.2, the argument parser raises an error if no subcommand is
listed on the command line. This does not seem to be the case with
Python 3.3.0, and the changed behavior seems to have been a side
effect of this:
http://hg.python.org/cpython/rev/
cab204a79e09
changeset: 70741:
cab204a79e09
user: R David Murray <rdmurray@bitdance.com>
date: Thu Jun 09 12:34:07 2011 -0400
summary: #10424: argument names are now included in the missing argument mes
Anyhow, it's easy enough to catch the new behaviour in rss2email and
print the appropriate error.
Reported-by: Dmitry Bogatov <KAction@gnu.org>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 18 Jan 2013 03:07:45 +0000 (22:07 -0500)]
rss2email: Sort __contributors__ by first name and add missing folks
This brings __contributors__ in line with the auto-generated AUTHORS.
I'm not convinced that listing __contributors__ in a Python-parsable
manner is worth the trouble, but I'll leave it in for now.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 10 Jan 2013 16:05:48 +0000 (11:05 -0500)]
feed|feeds: Update datafile format to version 2
We may want to store additional data about previously seen entries
besides our possibly auto-generated ID. Convert the `seen` mapping
from:
entry_id -> our_id
to:
entry_id -> {'id': our_id}
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 10 Jan 2013 13:57:17 +0000 (08:57 -0500)]
r2e.1: Update ConfigParser URL after PEP 430
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 10 Jan 2013 13:39:52 +0000 (08:39 -0500)]
r2e.1: Update __contributors__ location and direct bugs to the mailing list
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 10 Jan 2013 13:35:16 +0000 (08:35 -0500)]
r2e.1: Update maintainer from Lindsey to me.
This should have happened in:
commit
6460e8738b5e7c66df6c2143e8a29c048cf308bd
Author: W. Trevor King <wking@tremily.us>
Date: Fri Nov 9 07:46:17 2012 -0500
Change maintainer from Lindsey to Trevor.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 10 Jan 2013 15:09:26 +0000 (10:09 -0500)]
feeds: Follow the the XDG Base Directory Specification
This splits config files and data files into different directories (by
default), so we no longer need an rss2email subdirectory (we only have
one config file and one data file). The default config file is now
~/.config/rss2email.cfg and the default data file is now
~/.local/share/rss2email.json.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 10 Jan 2013 15:03:41 +0000 (10:03 -0500)]
feeds: Use JSON instead of Pickle for storing dynamic feed state
It's safer, more portable, and possibly faster. I've also added
version information to the data file for easier future upgrades.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 10 Jan 2013 11:43:05 +0000 (06:43 -0500)]
feed: Split entry link extraction out into Feed._get_entry_link
Now other methods can all access the same link, without having to
extract it in _process_entry and pass the extracted link around
explicitly. Currently, nothing special happens during link
extraction, but the new method helps future proof us.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 9 Jan 2013 16:25:35 +0000 (11:25 -0500)]
Ran update-copyright.py
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 9 Jan 2013 16:22:48 +0000 (11:22 -0500)]
main: Add missing `# Copyright` tag for update-copyright.py
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 9 Jan 2013 16:20:30 +0000 (11:20 -0500)]
version: Add get_versions and teach rss2email --full-version
This makes it easier for users to submit useful bug reports.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 9 Jan 2013 16:01:17 +0000 (11:01 -0500)]
main: Add an explicit --version argument.
Use an explicit `version` action instead of the undocumented (and
deprecated) `version` keyword to ArgumentParser [1,2].
[1]: http://docs.python.org/3.3/library/argparse.html#action
[2]: http://docs.python.org/3.3/library/argparse.html#upgrading-optparse-code
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 9 Jan 2013 15:43:24 +0000 (10:43 -0500)]
error: Strip trailing whitespace in module docstring
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 9 Jan 2013 15:27:22 +0000 (10:27 -0500)]
feed: Raise the new InvalidFeedConfig on missing feed.url
You can't fetch a feed without a URL. This new error message makes
the cause explicit, compared to the somewhat ambigious former error
messages:
fetch $NAME (None -> $TO)
process $NAME (None -> $TO)
HTTP status 200
could not get HTTP headers: $NAME (None -> $TO)
unrecognized version: $NAME (None -> $TO)
sax parsing error: <unknown>:2:0: no element found: $NAME (None -> $TO)
I was getting URL-less feeds when I clobbered my
~/.config/rss2email/config [1], removing some newer entries. However,
because I never deleted the feeds explicitly, they were repopulated
(without their URL) from ~/.config/rss2email/feeds.dat, and subsequent
runs generated the above error.
[1]: The clobbering was related to my dotfile management, and not due
to an rss2email issue.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 11 Dec 2012 01:11:38 +0000 (20:11 -0500)]
test:disqus: add a Disqus feed for testing
This raised a few issues with the handling of missing IDs, which I've
just fixed. The new test will make sure we keep exercising these code
paths.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 11 Dec 2012 01:07:35 +0000 (20:07 -0500)]
feed: fix id fallback in Feed._process_entry
The old `entry['id'] or _id` raised a KeyError when the entry lacked
an ID. The new `entry.get('id', _id)` falls back appropriately.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 11 Dec 2012 01:00:35 +0000 (20:00 -0500)]
feed: use hashlib.sha1 for fallback IDs in Feed._get_entry_id
This fixes what I broke in
commit
0fc2ff4465d741823b3dceebcfcf3a98a0081522
Author: W. Trevor King <wking@tremily.us>
Date: Thu Oct 4 08:33:32 2012 -0400
Cleanup global module configuration.
diff --git a/rss2email.py b/rss2email.py
index
216c13d..
3919bd7 100755
--- a/rss2email.py
+++ b/rss2email.py
...
-hash = hashlib.md5
...
I've come back with sha1 instead of md5, mostly because that's what
Git uses ;). We don't migrate recorded IDs from the old configuration
method, so the change from md5 to sha1 shouldn't affect anyone.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 10 Dec 2012 23:23:51 +0000 (18:23 -0500)]
feed: fix Feed._get_entry_content unpacking in Feed._get_entry_title
_get_entry_content returns a single dict, but _get_entry_id had been
unpacking it as if it was an object with a `content` attribute.
Also convert HTML to text before extracting the title, to avoid things
like `<p>In the beginning...` in the title.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 10 Dec 2012 23:21:34 +0000 (18:21 -0500)]
feed: fix Feed._get_entry_content unpacking in Feed._get_entry_id
_get_entry_content returns a single dict, but _get_entry_id had been
unpacking it as if it was a length-two tuple.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 10 Dec 2012 23:13:40 +0000 (18:13 -0500)]
feed: fix the `type` key returned by Feed._get_entry_content
The previous version used the Python object `type` where it should
have used the string 'type'. I hadn't caught the bug before because
none of my example feeds fell through that far.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 10 Dec 2012 23:06:45 +0000 (18:06 -0500)]
test:gmane:README: fix "feed.atom" -> "feed.rss" typo
Probably a copy-paste error from seeding with
test/allthingsrss/README.
Signed-off-by: W. Trevor King <wking@tremily.us>