W. Trevor King [Tue, 16 Dec 2014 22:24:03 +0000 (14:24 -0800)]
Merge branch 'patch-1' of https://github.com/punchagan/rss2email
* 'patch-1' of https://github.com/punchagan/rss2email:
Minor import fix in redirect post process hook.
Signed-off-by: W. Trevor King <wking@tremily.us>
Puneeth Chaganti [Tue, 16 Dec 2014 19:46:02 +0000 (01:16 +0530)]
Minor import fix in redirect post process hook.
Signed-off-by: Puneeth Chaganti <punchagan@muse-amuse.in>
W. Trevor King [Sun, 28 Sep 2014 20:46:59 +0000 (13:46 -0700)]
Merge branch 'content-type-warning'
A few folks have stumbled over this one, thinking that the error
message meant the feed wasn't parsed [1,2,3]. All we're actually
doing is passing along a feedparser warning, so adjust our prefix
accordingly and drop the level from ERROR to WARNING. Hopefully this
makes it clearer that:
* We still process these feeds, despite the bozo exception. If
feedparser can parse the feed despite the exception, it will do so.
* We probably can't fix these issues in rss2email. You need to fix
them in the feed itself, or adjust feedparser to deal with the
busted feed.
This merge pulls in:
* content-type-warning:
CHANGELOG: Document this branch
Log a warning when Content-Type is not correct
[1]: http://article.gmane.org/gmane.mail.rss2email/168
[2]: https://github.com/wking/rss2email/issues/37
[3]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=760963
Reported-by: Arun Persaud <apersaud@lbl.gov>
Reported-by: Joey Hess <joeyh@debian.org>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sun, 28 Sep 2014 20:45:39 +0000 (13:45 -0700)]
CHANGELOG: Document this branch
Signed-off-by: W. Trevor King <wking@tremily.us>
Etienne Millon [Sun, 28 Sep 2014 19:56:41 +0000 (21:56 +0200)]
Log a warning when Content-Type is not correct
When a feed is served with a wrong Content-Type, log a warning and not an error.
This prevents a "processing error" message from being displayed at the default
log level, which is confusing for a innocuous "error" like this.
See https://bugs.debian.org/760963.
Signed-off-by: Etienne Millon <me@emillon.org>
W. Trevor King [Sun, 28 Sep 2014 18:01:43 +0000 (11:01 -0700)]
Merge branch 'config-manpage'
This is not very DRY, but it gets the job done for now. If it turns
out to be a pain to maintain, I may try to automate the extraction
later.
* config-manpage:
r2e.1: Fix 'massage' -> 'message' typo
CHANGELOG: Document this branch
Document configuration items in manpage
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sun, 28 Sep 2014 18:00:56 +0000 (11:00 -0700)]
r2e.1: Fix 'massage' -> 'message' typo
Port
74ea4b7 (config: Fix 'massage' -> 'message' typo, 2013-09-29)
into this branch.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sun, 28 Sep 2014 17:57:34 +0000 (10:57 -0700)]
CHANGELOG: Document this branch
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sun, 28 Sep 2014 17:52:45 +0000 (10:52 -0700)]
Merge branch 'RemoveURLRedirect'
* RemoveURLRedirect:
CHANGELOG: Document this branch
post_process.redirect: Log exceptions as warnings
post_process.redirect: Add support for enclosures
post_process.redirect: Specify user-agent
post_process.redirect: Add hook to remove redirections
config: Fix 'massage' -> 'message' typo
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sun, 28 Sep 2014 17:51:22 +0000 (10:51 -0700)]
CHANGELOG: Document this branch
Which has clearly been floating around for too long, since it branched
off before 3.7 and is landing after 3.9 :/.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sun, 28 Sep 2014 17:47:15 +0000 (10:47 -0700)]
post_process.redirect: Log exceptions as warnings
When we hit an exception trying to unwind any redirects, just print
the exception and leave the link with its original value.
This commit also alphabetizes the standard library imports, and splits
off the local import following PEP 8's suggested groupings [1].
[1]: http://legacy.python.org/dev/peps/pep-0008/#imports
Signed-off-by: W. Trevor King <wking@tremily.us>
Etienne Millon [Mon, 30 Jun 2014 17:43:08 +0000 (19:43 +0200)]
Document configuration items in manpage
This adds current items to the "CONFIGURATION" section in r2e.1, manually
extracted from rss2email/config.py.
Signed-off-by: Etienne Millon <me@emillon.org>
W. Trevor King [Mon, 1 Sep 2014 23:34:50 +0000 (16:34 -0700)]
CHANGELOG: Fix 'inhertence' -> 'inheritance' typo
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 1 Sep 2014 23:21:01 +0000 (16:21 -0700)]
Bump to version 3.9
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 1 Sep 2014 23:19:27 +0000 (16:19 -0700)]
Run update-copyright.py
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 1 Sep 2014 22:48:21 +0000 (15:48 -0700)]
feeds: Raise an exception when adding a feed with a duplicate name
Avoid:
$ r2e add example http://example.com/feed1
$ r2e add example http://example.com/feed2
$ r2e list
1: [*] example (http://example.com/feed2 -> lkmorlan)
2: [*] example (http://example.com/feed2 -> lkmorlan)
Instead, the second addition now prints:
duplicate feed name 'example'
to stderr and exits with status 1.
Reported-by: Liam K Morland <Liam@Morland.ca>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 1 Sep 2014 21:34:37 +0000 (14:34 -0700)]
Merge branch 'sendmail-split-addr'
* sendmail-split-addr:
CHANGELOG: Document this branch
email: Split sender into both sendmail's -F and -f
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 1 Sep 2014 21:14:17 +0000 (14:14 -0700)]
CHANGELOG: Document this branch
Documents the following commits:
*
d62baca1 email: Split sender into both sendmail's -F and -f
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Thu, 17 Jul 2014 20:04:10 +0000 (13:04 -0700)]
email: Split sender into both sendmail's -F and -f
Use -F for the name and -f for the address, instead of passing the
composite "name <address>" to -f. Only the address is used in the
SMTP envelope [1,2], so most mailers will probably ignore -F. For
example, Postfix only uses it when there is no 'From' header in the
message itself [3].
The old behavior broke some sendmail implementation that assumed the
whole -f argument was an address (reportedly OpenSMTPD). I haven't
noticed one of these sendmail implementions myself, but they'll create
envelope senders like:
MAIL FROM:<"foobar <abc>" <foo@bar>>
when we only want:
MAIL FROM:<foo@bar>
[1]: http://tools.ietf.org/html/rfc2821#section-3.3
[2]: http://tools.ietf.org/html/rfc2821#section-4.1.1.2
[3]: http://www.postfix.org/sendmail.1.html
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 26 Aug 2014 04:27:26 +0000 (21:27 -0700)]
error: Swap SMTPConnectionError superclasses
To avoid:
>>> import rss2email.error
>>> rss2email.error.SMTPAuthenticationError(server='host', username='bob'))
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "./rss2email/error.py", line 85, in __init__
server=server, message=message)
File "./rss2email/error.py", line 67, in __init__
super(SMTPConnectionError, self).__init__(message=message)
TypeError: SMTPAuthenticationError does not take keyword arguments
Reported-by: Federico Churca-Torrusio <fchurca@fi.uba.ar>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 30 Jun 2014 17:10:10 +0000 (10:10 -0700)]
Merge branch 'ssl-protocol'
* ssl-protocol:
CHANGELOG: Document this branch
Added the option "smtp-ssl-protocol" to make STARTTLS work on Python 3.3+
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 30 Jun 2014 16:58:48 +0000 (09:58 -0700)]
CHANGELOG: Document this branch
Documents the following commits:
*
2369d56 Added the option "smtp-ssl-protocol" to make STARTTLS work
on Python 3.3+
Signed-off-by: W. Trevor King <wking@tremily.us>
Thiago Coutinho [Fri, 13 Jun 2014 21:37:12 +0000 (18:37 -0300)]
Added the option "smtp-ssl-protocol" to make STARTTLS work on Python 3.3+
Signed-off-by: Thiago Coutinho <root@thiagoc.net>
W. Trevor King [Thu, 20 Mar 2014 20:03:10 +0000 (13:03 -0700)]
feed: Change _USER_AGENT from '+{url}' to '({url})'
In Debian bug 742215 [1], Jakup points out that the old User-Agent is
out-of-spec. From RFC 2616 [2]:
User-Agent = "User-Agent" ":" 1*( product | comment )
product = token ["/" product-version]
product-version = token
token = 1*<any CHAR except CTLs or separators>
separators = "(" | ")" | "<" | ">" | "@"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
comment = "(" *( ctext | quoted-pair | comment ) ")"
ctext = <any TEXT excluding "(" and ")">
[1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=742215
[2]: User-Agent: https://tools.ietf.org/html/rfc2616#section-14.43
product: https://tools.ietf.org/html/rfc2616#section-3.8
token, separators, comment, ctext:
https://tools.ietf.org/html/rfc2616#section-2.2
Reported-by: Jakub Wilk <jwilk@debian.org>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 28 Feb 2014 22:25:57 +0000 (14:25 -0800)]
Merge branch 'no-to-email-error'
* no-to-email-error:
CHANGELOG: Document this branch
command: Set the 'feed' argument when raising NoToEmailAddress
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 28 Feb 2014 22:22:32 +0000 (14:22 -0800)]
CHANGELOG: Document this branch
Documents the following commits:
*
763d9a8 command: Set the 'feed' argument when raising NoToEmailAddress
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 28 Feb 2014 21:51:15 +0000 (13:51 -0800)]
command: Set the 'feed' argument when raising NoToEmailAddress
Before this commit, if no default target email was setup in the
config:
$ r2e add example http://example.net/
Traceback (most recent call last):
File "/usr/bin/r2e", line 5, in <module>
rss2email.main.run()
File "/usr/lib/python3/dist-packages/rss2email/main.py", line 163, in run
args.func(feeds=feeds, args=args)
File "/usr/lib/python3/dist-packages/rss2email/command.py", line 50, in add
raise _error.NoToEmailAddress(feeds=feeds)
TypeError: __init__() missing 1 required positional argument: 'feed'
After this commit:
$ r2e add example http://example.net/
no target email address has been defined
I added the kwargs handling to FeedError so it will be passed through
to FeedsError because NoToEmailAddress has a diamond-inheritence that
includes both RSS2EmailError subclasses.
Reported-by: Jakub Wilk <jwilk@debian.org>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 25 Jan 2014 02:54:47 +0000 (18:54 -0800)]
Merge remote-tracking branch 's-o-b/contributing-github'
* s-o-b/contributing-github:
CONTRIBUTING.md: Update SubmittingPatches link
developer-certificate-of-origin: Add v1.1 from the Linux Foundation
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 24 Jan 2014 23:43:38 +0000 (15:43 -0800)]
CONTRIBUTING.md: Update SubmittingPatches link
And link to the newly-local DCO.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 25 Jan 2014 00:07:36 +0000 (16:07 -0800)]
Merge branch 'dco' into contributing-github
* dco:
developer-certificate-of-origin: Add v1.1 from the Linux Foundation
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 24 Jan 2014 23:46:30 +0000 (15:46 -0800)]
developer-certificate-of-origin: Add v1.1 from the Linux Foundation
Luis R. Rodriguez [1] has been trying to get information about the
licensing of the Linux kernel's DCO [2] for a while now [3,4], and it
looks like the Linux Foundation just made their view explicit [5,6,7].
This clarifies the copyright and licensing of Linus' two patches:
*
857a183 Update DCO ("signoff") rules to 1.1
*
991bd2e Start documenting the sign-off procedure in SubmittingPatches
From the whois information, developercertificate.org is pretty recent:
$ whois developercertificate.org
Domain Name: DEVELOPERCERTIFICATE.ORG
Domain ID:
D170689185-LROR
Creation Date: 2014-01-15T02:54:55Z
Updated Date: 2014-01-17T22:11:12Z
...
Name Server: NS2.LINUX-FOUNDATION.ORG
Name Server: NS1.LINUX-FOUNDATION.ORG
Now that this has an upstream source and it's own license (verbatim
copies only), I'm putting this file in its own branch. Downloaded
just now:
$ wget -S -O developer-certificate-of-origin http://developercertificate.org/
--2014-01-24 15:46:21-- http://developercertificate.org/
Resolving developercertificate.org... 140.211.169.4
Connecting to developercertificate.org|140.211.169.4|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: nginx
Date: Fri, 24 Jan 2014 23:46:58 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Last-Modified: Fri, 17 Jan 2014 23:02:25 GMT
ETag: "5c188d-6c5-
4f0328910e8f0"
Accept-Ranges: bytes
Content-Length: 1733
Length: 1733 (1.7K) [text/html]
Saving to: ‘developer-certificate-of-origin’
2014-01-24 15:46:21 (112 MB/s) - ‘developer-certificate-of-origin’ saved [1733/1733]
After which I stripped out the HTML, leaving just the DCO text.
[1]: http://www.do-not-panic.com/
[2]: https://www.kernel.org/doc/Documentation/SubmittingPatches
[3]: http://thread.gmane.org/gmane.linux.kernel/
1397613
[4]: http://thread.gmane.org/gmane.linux.kernel/
1492612
[5]: http://article.gmane.org/gmane.linux.kernel.wireless.general/118696
[6]: http://article.gmane.org/gmane.linux.kernel/
1635433
[7]: http://developercertificate.org/
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 18 Jan 2014 19:58:49 +0000 (11:58 -0800)]
Merge branch 'trustlink'
* trustlink:
CHANGELOG: Document this branch
config and feed: Added trust-link preference
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 18 Jan 2014 19:46:48 +0000 (11:46 -0800)]
CHANGELOG: Document this branch
And add a warning about toggling the setting for active feeds.
For both George and me, the motivation for this change was working
around feed authors that change the id after minor changes in content:
On Sat, Jan 18, 2014 at 1:40 PM, W. Trevor King wrote:
> Some of the newspaper feeds I follow have duplicate entries in
> their feed if they tweaked the title or content, but I rarely care
> about the changes.
On Sat, Jan 18, 2014 at 02:16:19PM -0500, George Saunders wrote:
> That's exactly the situation I added it for.
The Atom spec explicitly says that revisions should keep the same id
[1]:
When an Atom Document is relocated, migrated, syndicated,
republished, exported, or imported, the content of its atom:id
element MUST NOT change. Put another way, an atom:id element
pertains to all instantiations of a particular Atom entry or feed;
revisions retain the same content in their atom:id elements.
But not all feed generators are fully compliant ;).
[1]: http://tools.ietf.org/search/rfc4287#section-4.2.6
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 18 Jan 2014 18:45:48 +0000 (10:45 -0800)]
Bump to version 3.8
Signed-off-by: W. Trevor King <wking@tremily.us>
François Boulogne [Wed, 2 Oct 2013 19:15:17 +0000 (21:15 +0200)]
post_process.redirect: Add support for enclosures
Signed-off-by: François Boulogne <fboulogne sciunto org>
François Boulogne [Mon, 30 Sep 2013 11:46:33 +0000 (13:46 +0200)]
post_process.redirect: Specify user-agent
Signed-off-by: François Boulogne <fboulogne sciunto org>
Signed-off-by: W. Trevor King <wking@tremily.us>
François Boulogne [Sun, 29 Sep 2013 20:34:10 +0000 (22:34 +0200)]
post_process.redirect: Add hook to remove redirections
Signed-off-by: François Boulogne <fboulogne sciunto org>
Signed-off-by: W. Trevor King <wking@tremily.us>
François Boulogne [Sun, 29 Sep 2013 20:20:35 +0000 (22:20 +0200)]
config: Fix 'massage' -> 'message' typo
Signed-off-by: François Boulogne <fboulogne sciunto org>
Signed-off-by: W. Trevor King <wking@tremily.us>
George Saunders [Fri, 22 Mar 2013 04:49:27 +0000 (04:49 +0000)]
config and feed: Added trust-link preference
The trust-link preference allows the user to ignore feed
entries that repeat a previously seen link URL.
Signed-off-by: George Saunders <georgesaunders@gmail.com>
W. Trevor King [Sun, 20 Oct 2013 21:54:33 +0000 (14:54 -0700)]
Merge branch 'opmlimport-feed-name-slugging'
* opmlimport-feed-name-slugging:
CHANGELOG: Document this branch
feed: Adjust Feed._name_regexp to allow non-ASCII characters
command: Sluggify feed names on opmlimport
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sun, 20 Oct 2013 21:52:19 +0000 (14:52 -0700)]
CHANGELOG: Document this branch
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sun, 13 Oct 2013 22:18:06 +0000 (15:18 -0700)]
feed: Adjust Feed._name_regexp to allow non-ASCII characters
There's no need to restrict folks to the Latin alphabet.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sun, 13 Oct 2013 21:54:29 +0000 (14:54 -0700)]
command: Sluggify feed names on opmlimport
Gaëtan Harter writes [1]:
> Importing the following opml file fails with `invalid feed name
> 'Arch Linux: Recent news updates`
>
> <?xml version="1.0" encoding="UTF-8"?>
> <opml version="1.0">
> <head>
> <title>Google reader export</title>
> </head>
> <body>
> <outline text="Arch Linux: Recent news updates"
> title="Arch Linux: Recent news updates" type="rss"
> xmlUrl="http://www.archlinux.org/feeds/news/"
> htmlUrl="https://www.archlinux.org/news/" />
> </body>
> </opml>
>
> It fails because the `text` field is used directly as `name` for
> creating a Feed object.
ConfigParser can handle colons and accented characters in their
section names [2], but Feed._set_name checks names against
Feed._name_regexp which only allows ASCII letters, digits, periods,
underscores, and the hyphen-minus (U+002D). Add an inverse
name_slug_regexp to opmlimport that replaces any runs of illegal
characters with a single hyphen-minus, to avoid crashing if the text
attribute contains anything illegal.
[1]: https://github.com/wking/rss2email/issues/24#issuecomment-
26224593
[2]: http://docs.python.org/3/library/configparser.html#supported-ini-file-structure
Reported-by: Gaëtan Harter <hartergaetan@gmail.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 11 Oct 2013 17:11:24 +0000 (10:11 -0700)]
MANIFEST.in: Add the AUTHORS file for distribution
Reported-by: Arun Persaud <apersaud@lbl.gov>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 11 Oct 2013 15:30:12 +0000 (08:30 -0700)]
Bump to version 3.7
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 11 Oct 2013 15:44:56 +0000 (08:44 -0700)]
setup.py: Claim support for Python 3.3
We've supported 3.3 for the whole rss2email 3.x branch, but I forgot
to mention it in the trove classifiers.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 11 Oct 2013 15:39:11 +0000 (08:39 -0700)]
README: Three-space indents for nested enumerated lists
Apparently two spaces doesn't cut it. This change fixes:
$ rst2html.py --strict README
README:5: (INFO/1) Enumerated list start value not ordinal-1: "3" (ordinal 3)
Exiting due to level-1 (INFO) system message.
I also use a bullet list for dependencies, because the order in which
you install them doesn't matter to me.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 11 Oct 2013 15:02:48 +0000 (08:02 -0700)]
Merge branch 'catch-html-parse-error'
* catch-html-parse-error:
CHANGELOG: Document this branch's HTML-title conversion fallback
feed: Add 'default' argument to Feed._html2text for HTMLParseError
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 11 Oct 2013 15:01:51 +0000 (08:01 -0700)]
CHANGELOG: Document this branch's HTML-title conversion fallback
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 11 Oct 2013 14:56:15 +0000 (07:56 -0700)]
Merge branch 'robust-file-saving'
* robust-file-saving:
CHANGELOG: Document this branch's atomic saves
feeds: Make Feeds.save fully atomic, assuming a working fsync
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 11 Oct 2013 14:54:21 +0000 (07:54 -0700)]
CHANGELOG: Document this branch's atomic saves
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 28 Sep 2013 16:51:03 +0000 (09:51 -0700)]
feed: Add 'default' argument to Feed._html2text for HTMLParseError
This allows us to easily fall back on an unconverted string in the
event that the input HTML is malformed. We already caught
HTMLParseError when converting HTML to plain test for non-html mail,
but we didn't catch it in Feed._get_entry_title. Now we gracefully
handle the situation by treating the malformed HTML as plain text.
W. Trevor King [Sat, 28 Sep 2013 16:01:52 +0000 (09:01 -0700)]
feeds: Make Feeds.save fully atomic, assuming a working fsync
If the disk is full (or there are other OS-level issues), a file may
not be completely written to the disk.
The write-flush-fsync-rename sequence is much safer. The fsync
invocation matches the recommendation in the docs [1]:
If you’re starting with a buffered Python file object f, first do
f.flush(), and then do os.fsync(f.fileno()), to ensure that all
internal buffers associated with f are written to disk.
The purpose of each step is:
* write: move the data into a library buffer
* flush: flush the library buffer into a kernel buffer
* fsync: flush the kernel buffer onto the disk at $tempfile
* rename: adjust the metadata so that the $filename points to the
$tempfile data, release the old data
This means that if the rename works we get the new data, and if the
rename fails we still have the old data.
However, POSIX's fsync is implementation defined unless
_POSIX_SYNCHRONIZED_IO is defined [3,4], and some OS X implementations
go the no-op route, as Stewart Smith points out in his excellent "Eat
My Data: How everybody gets file I/O wrong" [4]. If you want to run
rss2email on such a system, verifying your data integrity is up to you
;).
We used to write-rename the data file (but not the config) on *nix
[2]. Now we do the full write-flush-fsync-rename for both the config
and data files on both *nix and other systems.
[1]: http://docs.python.org/3/library/os.html#os.fsync
[2]: For rss2email, *nix is "has fcntl, but isn't SunOS
[3]: http://pubs.opengroup.org/onlinepubs/
009695399/functions/fsync.html
[4]: https://www.flamingspork.com/talks/2007/06/eat_my_data.odp
Reported-by: Etienne Millon <me@emillon.org>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 20 Sep 2013 07:41:00 +0000 (00:41 -0700)]
r2e.1: Remove quotes around 'name-format' value
ConfigParser doesn't need quoting around string values, so if you use
quotes they will show up explicitly:
From: "'indexed: Jessica Hagy'" <user@rss2email.invalid>
That's probably not what you want ;).
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 14 Sep 2013 16:58:11 +0000 (09:58 -0700)]
Run update-copyright.py
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 14 Sep 2013 16:57:12 +0000 (09:57 -0700)]
Merge branch 'format-entry-name'
* format-entry-name:
feed: Give defaults for _get_entry_name formatting data
feed: Convert 'friendly-name' setting to 'name-format'
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 14 Sep 2013 16:49:56 +0000 (09:49 -0700)]
feed: Give defaults for _get_entry_name formatting data
We don't want to crash if the source feed is missing some data that
the user expects, or if the user just hasn't had the time to adjust
the name-format config.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 14 Sep 2013 15:47:16 +0000 (08:47 -0700)]
Merge branch 'no-prefered-xml-parser'
* no-prefered-xml-parser:
feed: Disable feedparser's PREFERRED_XML_PARSERS
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 14 Sep 2013 15:44:32 +0000 (08:44 -0700)]
Merge branch 'accept-feedparser-encoding-override'
* accept-feedparser-encoding-override:
CHANGELOG: Demote guessed encodings logs from 'error' to 'warning'
feed: don't emit error if parser able to auto-determine encoding
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 14 Sep 2013 15:41:31 +0000 (08:41 -0700)]
CHANGELOG: Demote guessed encodings logs from 'error' to 'warning'
Documenting:
05f2628 feed: don't emit error if parser able to auto-determine encoding
Signed-off-by: W. Trevor King <wking@tremily.us>
J. Lewis Muir [Tue, 10 Sep 2013 16:48:06 +0000 (11:48 -0500)]
feed: don't emit error if parser able to auto-determine encoding
Some feeds have incorrectly declared encodings (e.g. the encoding
specified by the HTTP header does not match the encoding specified in
the XML declaration). For such a feed, "r2e run" would emit an error
message similar to the following:
processing error: document declared as us-ascii, but parsed as
iso-8859-1: undeadly (http://undeadly.org/cgi?action=rss ->
jlmuir@imca-cat.org)
In this particular case, the HTTP header indicated a content type of
"text/xml" with no "charset" parameter. According to the feedparser
5.1.3 documentation (section "Introduction to Character Encoding" [1]),
this results in an encoding of US-ASCII. But the served XML document
contains an encoding declaration of ISO-8859-1.
For this case and some others, feedparser is able to automatically
determine an encoding. When it does, we emit a warning rather than an
error, and accept the automatically determined encoding.
We check for a successfully overridden encoding by looking at the bozo
bit and the bozo_exception. If the bozo bit is set and the
bozo_exception is feedparser.CharacterEncodingOverride, the parser has
successfully overridden an incorrectly declared encoding. Quoting from
the feedparser 5.1.3 documentation, section "Handling
Incorrectly-Declared Encodings" [2]:
Universal Feed Parser initially uses the rules specified in RFC 3023
to determine the character encoding of the feed. If parsing succeeds,
then that's that. If parsing fails, Universal Feed Parser sets the
bozo bit to 1 and sets bozo_exception to
feedparser.CharacterEncodingOverride. Then it tries to reparse the
feed with the following character encodings:
1. the encoding specified in the XML declaration
2. the encoding sniffed from the first four bytes of the document (as
per Section F)
3. the encoding auto-detected by the Universal Encoding Detector, if
installed
4. utf-8
5. windows-1252
If the character encoding can not be determined, Universal Feed Parser
sets the bozo bit to 1 and sets bozo_exception to
feedparser.CharacterEncodingUnknown. In this case, parsed values will
be strings, not Unicode strings.
References:
1. http://pythonhosted.org/feedparser/character-encoding.html#introduction-to-character-encoding
2. http://pythonhosted.org/feedparser/character-encoding.html#handling-incorrectly-declared-encodings
Signed-off-by: J. Lewis Muir <jlmuir@imca-cat.org>
W. Trevor King [Tue, 10 Sep 2013 19:04:05 +0000 (12:04 -0700)]
feed: Convert 'friendly-name' setting to 'name-format'
In Debian bug 722009, Joey Hess wrote about the 2.x series [1]:
> The current From line generated by r2e is From: Blog: Author
> where
> Blog is the name of the blog, or some page like a wiki's RecentChanges
> Author is the author of a post
> iff the blog sets that info in the feed
>
> In mutt the Blog part often occupies the whole displayed Subject field,
> which is fixed width. So the Author cannot be seen. This is particularly
> a problem with planets, where the author of a post matters a lot.
> But also with some blogs that have multiple authors.
>
> For these sorts of blogs, I would generally prefer to use a From line
> like From: Author (Blog)
> This does mean that when sorting by author, all posts of a blog or
> planet feed may not appear together, but that would be an acceptable
> tradeoff to me.
>
> One way to implement this (other than just changing the format string)
> would be to make OVERRIDE_FROM able to contain a format string,
> so it could be configured on a per-feed basis.
The new setup makes the name-formatting configurable on a per feed
basis (what Joey wanted), but it's not without side effects. For
feeds where some information is missing (feed-title, author, or
publisher), we used to adjust the formatting on the fly. For example,
you'd get output like '{author}' if the feed-title was missing,
instead of getting ': {author}'. Now users bothered by this will have
to manually override the format template for feeds missing crucial
data.
[1]: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=722009
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 10 Sep 2013 18:41:12 +0000 (11:41 -0700)]
feed: Disable feedparser's PREFERRED_XML_PARSERS
Feedparser's default parser (drv_libxml2) has trouble parsing byte
streams in Python 3:
$ python -c 'import rss2email.feed; import doctest; doctest.testmod(rss2email.feed)'
...
File "rss2email/feed.py", line 319, in rss2email.feed.Feed._fetch
Failed example:
parsed = feed._fetch()
Exception raised:
Traceback (most recent call last):
File "rss2email/util.py", line 61, in run
self.result = self._target(*self._args, **self._kwargs)
File "/.../feedparser/feedparser.py", line 3745, in parse
saxparser.parse(source)
File "/usr/lib64/python3.2/site-packages/drv_libxml2.py", line 270, in parse
_d(reader.Name()),_d(reader.Value()))
File "/usr/lib64/python3.2/site-packages/drv_libxml2.py", line 70, in _d
return _decoder(s)[0]
File "/usr/lib64/python3.2/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
TypeError: 'str' does not support the buffer interface
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib64/python3.2/doctest.py", line 1288, in __run
compileflags, 1), test.globs)
File "<doctest rss2email.feed.Feed._fetch[1]>", line 1, in <module>
parsed = feed._fetch()
File "rss2email/feed.py", line 336, in _fetch
return f(self.url, self.etag, modified=self.modified, **kwargs)
File "rss2email/util.py", line 76, in __call__
time_limited_function=self) from self.error[1]
rss2email.error.TimeoutError: error while running time limited function: 'str' does not support the buffer interface
...
You can reproduce the underlying exception with this minimal script:
import io
import xml.sax
import xml.sax.handler
data = b'<feed xmlns="http://www.w3.org/2005/Atom"><entry><author><name>Example author</name><email>me@example.com</email><url>http://example.com/</url></author></entry></feed>'
source = xml.sax.xmlreader.InputSource()
source.setByteStream(io.BytesIO(data))
saxparser = xml.sax.make_parser(["drv_libxml2"])
saxparser.setContentHandler(xml.sax.handler.ContentHandler())
saxparser.parse(source)
which raises:
Traceback (most recent call last):
File "<stdin>", line 13, in <module>
saxparser.parse(source)
File "/usr/lib64/python3.2/site-packages/drv_libxml2.py", line 222, in parse
eltName = _d(reader.Name())
File "/usr/lib64/python3.2/site-packages/drv_libxml2.py", line 70, in _d
return _decoder(s)[0]
File "/usr/lib64/python3.2/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
TypeError: 'str' does not support the buffer interface
at least for libxml2-2.9.1.
By using the stdlib's default parser (instead of drv_libxml2), we can
avoid the error and get successful parsing. If you don't have
drv_libxml2 installed, sax was already falling back on the stdlib's
default parser, so this commit will be a no-op.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 9 Sep 2013 17:30:17 +0000 (10:30 -0700)]
Bump to version 3.6
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 9 Sep 2013 17:28:18 +0000 (10:28 -0700)]
CHANGELOG: Document missing port argument for IMAPAuthenticationError fix
Documenting:
9124d62 email: fixed missing port argument for IMAPAuthenticationError
Signed-off-by: W. Trevor King <wking@tremily.us>
Arun Persaud [Mon, 5 Aug 2013 19:13:07 +0000 (12:13 -0700)]
email: fixed missing port argument for IMAPAuthenticationError
Signed-off-by: Arun Persaud <apersaud@lbl.gov>
W. Trevor King [Mon, 10 Jun 2013 20:37:05 +0000 (16:37 -0400)]
test/gmane/3: Add tests for HTML generation
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 10 Jun 2013 20:30:35 +0000 (16:30 -0400)]
test/gmane/2.expected: Remove some whitespace
I'm not sure which version of html2text I used to generate the initial
expected results, but this version was generated with html2text
3.200.3.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Mon, 10 Jun 2013 20:25:27 +0000 (16:25 -0400)]
test/test.py: Update clean_result() to normalize user agents
Catch up with the USER_AGENT changes from
3f9adb5 (feed: Add the
digest setting for multi-entry email, 2013-04-13).
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sun, 9 Jun 2013 21:26:31 +0000 (17:26 -0400)]
CHANGELOG: Document HTML syntax fix
Documenting:
4aa7f1d Fixed syntactical error when generating HTML mails
The commit by Dennis Keitzel fixed a typo in the HTML which dated back
to
00e2eecc (Spread cmd_run() logic out into Feed methods,
2012-10-04).
Signed-off-by: W. Trevor King <wking@tremily.us>
Dennis Keitzel [Sat, 8 Jun 2013 09:32:55 +0000 (11:32 +0200)]
Fixed syntactical error when generating HTML mails
Signed-off-by: Dennis Keitzel <github@pinshot.net>
W. Trevor King [Wed, 5 Jun 2013 22:13:14 +0000 (18:13 -0400)]
Bump to version 3.5
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 5 Jun 2013 22:09:52 +0000 (18:09 -0400)]
CHANGELOG: Mention the digest addition
Documenting:
92c0e76 Merge branch 'digest'
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Wed, 5 Jun 2013 22:08:47 +0000 (18:08 -0400)]
Merge branch 'digest'
* digest:
feed: Add the digest-post-process setting
feed: Add the digest setting for multi-entry email
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 14 May 2013 11:57:50 +0000 (07:57 -0400)]
command: Add newlines to OPML export
No semantic change, but it makes the exported data easier for humans
to read.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 14 May 2013 11:37:33 +0000 (07:37 -0400)]
command: Use feed names in OPML 'text' attributes
Instead of writing the URL as the 'text' attribute and ignoring it on
read, we now use the attribute to store the feed name. This avoids
auto-generated feed names on import. From the OPML 2.0 spec [1]:
Subscription lists
...
Required attributes: type, text, xmlUrl. For outline elements whose
type is rss, the text attribute should initially be the top-level
title element in the feed being pointed to, however since it is
user-editable, processors should not depend on it always containing
the title of the feed. xmlUrl is the http address of the feed.
We are not following the 'should' recommendation, but since we have
user-generated titles, I believe that the new usage is appropriate.
It's certainly closer to spec than storing a URL in 'text' :p.
[1]: http://dev.opml.org/spec2.html
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 14 May 2013 11:03:32 +0000 (07:03 -0400)]
command: Fix opmlexport crash due to orphaned feed data
When you remove a feed from your config file by hand, you might leave
the dynamic 'seen' data in the JSON data file by accident. If you
have such orphan data, the feed is loaded by Feeds._load_feeds() with
the default configuration (since you removed the config file entry).
This can lead to opmlexport errors like:
Traceback (most recent call last):
File "./r2e", line 5, in <module>
rss2email.main.run()
File "/.../rss2email/rss2email/main.py", line 163, in run
args.func(feeds=feeds, args=args)
File "/.../rss2email/rss2email/command.py", line 157, in opmlexport
url = _saxutils.escape(feed.url)
File "/usr/lib64/python3.2/xml/sax/saxutils.py", line 34, in escape
data = data.replace("&", "&")
AttributeError: 'NoneType' object has no attribute 'replace'
because the feeds lack the per-feed 'url' setting that had been
defined in the config file. With this commit, opmlexport drops these
URL-less feeds, instead of choking to death trying to format them ;).
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 14 May 2013 10:18:22 +0000 (06:18 -0400)]
README: Link to Fedora package
Also remove 'Linux' from distribution names. The goal is to point
folks using $DISTRO to their package, not to give details about the
technical underpinnings of a given distribution.
I've sorted the distributions by packaging format: deb, rpm, ebuild,
and Makefile.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 14 May 2013 08:38:07 +0000 (04:38 -0400)]
config: Replace Config._setup() with Config.setup_html2text()
Since
7bbbf62 (Setup html2text in Config._setup(), 2012-10-04), the
html2text configuration options have only been referenced from
Config._setup(), and that method was never called. With this commit,
we rename the method to setup_html2text(), and add a new
Feed._html2text() which invokes the setup before calling
html2text.html2text().
This caused a fair amount of churn in the expected test results, as
previously ignored default values for html2text kicked in. I also
added a test exercising the non-default values (allthingsrss/3), and
it looks like everything works as expected.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 14 May 2013 07:47:22 +0000 (03:47 -0400)]
Bump to version 3.4
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 11 May 2013 10:52:12 +0000 (06:52 -0400)]
email: Pass 'config' and 'section' through from send() to *_send()
Otherwise only the default configuration will be used, which is almost
certainly not what the user wants.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 12:38:01 +0000 (08:38 -0400)]
CHANGELOG: Update with summary of IMAP delivery addition
Documenting:
a7f2222 Merge branch 'imap'
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 12:33:09 +0000 (08:33 -0400)]
Merge branch 'imap'
* imap:
email: small fixes for using imap as a backend
email: Stub out send_imap()
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 12:26:01 +0000 (08:26 -0400)]
email: Remove debugging logging from _decode_header()
This was accidentally committed in
e08e198 (email: Decode headers when
checking .as_string() flatten fallback, 2013-02-17).
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 09:36:25 +0000 (05:36 -0400)]
feed: Add the digest-post-process setting
For users that want to manipulate the multi-entry message. For
example, if the stock `digest for <feed.name>` title doesn't cut it
for you, you can now use a post-processing hook to set the title
however you like.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 13 Apr 2013 22:05:03 +0000 (18:05 -0400)]
feed: Add the digest setting for multi-entry email
For high-volume feeds, some users want to receive a single email per
Feed.run() instead of a separate email for each new entry in the feed.
If you enable the new digest setting, the per-entry messages are
packed into a single multipart/digest message instead of being mailed
individually. The MIME details for digests are spelled out in RFC
2046 [1].
Peripheral changes:
* Added rss2email.feed._USER_AGENT, to get version information into
the User-Agent message headers and to avoid repeating myself.
* Normalize multipart MIME boundaries for easier testing of
multipart/digest messages.
[1]: http://tools.ietf.org/html/rfc2046#section-5.1.5
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 09:16:02 +0000 (05:16 -0400)]
Run update-copyright.py
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 09:14:21 +0000 (05:14 -0400)]
.update-copyright.conf: Use aliases to remove Aaron Swartz's email
Not much use in sending email to the deceased :(.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 08:58:28 +0000 (04:58 -0400)]
CHANGELOG: Update with summary of post-process addition
Documenting:
3a331d6 Merge branch 'post-process'
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 08:58:06 +0000 (04:58 -0400)]
Merge branch 'post-process'
* post-process:
rss2email/post_process/downcase.py: Move my test hook into Arun's directory
post_process: add documentation and a prettify example
Add configurable post-process hooks
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 10 May 2013 08:39:33 +0000 (04:39 -0400)]
rss2email/post_process/downcase.py: Move my test hook into Arun's directory
All the built-in hooks should live in the same sub-package. The
`post_process` name Arun used is more descriptive than my `hook`, so
move my downcase code there.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 19 Apr 2013 11:38:28 +0000 (07:38 -0400)]
README: Update the link to the NetBSD package
Although they're still packaging rss2email-2.71nb2. I'll see if I can
dig up a maintainer to ping about that.
Signed-off-by: W. Trevor King <wking@tremily.us>
Arun Persaud [Mon, 15 Apr 2013 21:49:52 +0000 (14:49 -0700)]
post_process: add documentation and a prettify example
* also mention it in the README file.
* package the filter via setup.py
Signed-off-by: Arun Persaud <apersaud@lbl.gov>
W. Trevor King [Mon, 15 Apr 2013 23:20:51 +0000 (19:20 -0400)]
r2e.1: Properly escape an ellipsis in the sample configuration
Following the example from nroff.1, which uses:
.RI [ file\~ .\|.\|.]
For reasons that I haven't bothered to track down, the ellipsis isn't
rendered correctly when it occurs at the beginning of a line (even
with the `\|` separators). After adding some leading whitespace,
everything seems to be working fine.
Reported-by: Matěj Cepl <mcepl@redhat.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 13 Apr 2013 23:30:28 +0000 (19:30 -0400)]
Bump to version 3.3
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Sat, 13 Apr 2013 23:27:30 +0000 (19:27 -0400)]
CHANGELOG: Update with summary of <table> removal
Documenting:
d293ab8 feed: Remove <table> elements from HTML mail
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 22 Jan 2013 16:59:36 +0000 (11:59 -0500)]
Add configurable post-process hooks
On Mon, Jan 21, 2013 at 11:16:58PM -0800, Arun Persaud wrote:
> but I was wondering if there is any chance to add some hooks, so
> that the user can modify the feed before it gets send, something
> that takes the url, uid, and other interesting information and
> returns the body of the feed that should get emailed.
This is not quite what he asked for (e.g., I don't pass the URL
explicitly, the hook should return the full message instead of just
payload, ...), but I think it get's the job done.
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Tue, 9 Apr 2013 19:30:11 +0000 (15:30 -0400)]
feed: Remove <table> elements from HTML mail
These were not semantically correct ;). Based on a patch by Rui Carmo
[1].
[1]: https://github.com/rcarmo/rss2email/commit/
2a015bce9d701035b9af874bd56c46f92382e668
Based-on-patch-by: Rui Carmo <rui.carmo@gmail.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
W. Trevor King [Fri, 5 Apr 2013 23:12:00 +0000 (19:12 -0400)]
README: Bump example snapshot versions from 2.71 to 3.2
Now that we have releases in the 3.x line, we should be pointing users
in that direction. These version numbers should probably be bumped
with each release :(.
Signed-off-by: W. Trevor King <wking@tremily.us>
Arun Persaud [Fri, 5 Apr 2013 19:04:35 +0000 (12:04 -0700)]
email: small fixes for using imap as a backend
* fixed two typos in "def send"
* removed some unecessary calls to imap.connect and
imap.close (which seems to be only needed in case you open
a mailbox, which we don't)
Signed-off-by: Arun Persaud <apersaud@lbl.gov>