rss2email.git
10 years agoWIP: feed: Add Content-Description to digest parts digest-description
W. Trevor King [Sun, 2 Feb 2014 15:49:07 +0000 (07:49 -0800)]
WIP: feed: Add Content-Description to digest parts

Copy the message Subject to the MIMEMessage's Content-Description
[1,2].  We already set the MIME types for the attachements (with
MIMEMessage).  Mutt [3] seems to look inside attached message parts
and extract their subject as a description, so the current xkcd feed
looked like:

  A     1 Questions for God               [message/rfc822, 7bit, 0.6K]
  A     2 Inexplicable                    [message/rfc822, 7bit, 0.6K]
  A     3 Theft                           [message/rfc822, 7bit, 0.7K]
  A     4 Actually                        [message/rfc822, 7bit, 0.7K]

However, in the notmuch-show Emacs mode [4] it looked like:

  xkcd.com: <author> <user@rss2email.invalid> (20 mins. ago) (inbox)
  Subject: digest for xkcd
  To: wking@tremily.us
  Date: Sat, 18 Jan 2014 17:18:20 +0000

  [ multipart/digest ]
  [ message/rfc822 (hidden) ]
  [ message/rfc822 (hidden) ]
  [ message/rfc822 (hidden) ]
  [ message/rfc822 (hidden) ]

which is not very informative.  With this commit, the Mutt rendering
is unchanged, and the notmuch-show-mode rendering is (WIP:
unchanged?):

  xkcd.com: <author> <user@rss2email.invalid> (Today 08:01) (inbox)
  Subject: digest for xkcd
  To: wking@tremily.us
  Date: Sun, 02 Feb 2014 16:01:32 +0000

  [ multipart/digest ]
  [ message/rfc822 (hidden) ]
  [ message/rfc822 (hidden) ]
  [ message/rfc822 (hidden) ]
  [ message/rfc822 (hidden) ]

I'll ping the Notmuch list about this...

[1]: http://tools.ietf.org/html/rfc2045#section-8
[2]: http://tools.ietf.org/html/rfc2183#section-3
[3]: http://www.mutt.org/
[4]: http://notmuchmail.org/emacstips/

Reported-by: Victor J. Orlikowski <vjo@duke.edu>
Signed-off-by: W. Trevor King <wking@tremily.us>
10 years agoMerge remote-tracking branch 's-o-b/contributing-github'
W. Trevor King [Sat, 25 Jan 2014 02:54:47 +0000 (18:54 -0800)]
Merge remote-tracking branch 's-o-b/contributing-github'

* s-o-b/contributing-github:
  CONTRIBUTING.md: Update SubmittingPatches link
  developer-certificate-of-origin: Add v1.1 from the Linux Foundation

Signed-off-by: W. Trevor King <wking@tremily.us>
10 years agoCONTRIBUTING.md: Update SubmittingPatches link
W. Trevor King [Fri, 24 Jan 2014 23:43:38 +0000 (15:43 -0800)]
CONTRIBUTING.md: Update SubmittingPatches link

And link to the newly-local DCO.

Signed-off-by: W. Trevor King <wking@tremily.us>
10 years agoMerge branch 'dco' into contributing-github
W. Trevor King [Sat, 25 Jan 2014 00:07:36 +0000 (16:07 -0800)]
Merge branch 'dco' into contributing-github

* dco:
  developer-certificate-of-origin: Add v1.1 from the Linux Foundation

Signed-off-by: W. Trevor King <wking@tremily.us>
10 years agodeveloper-certificate-of-origin: Add v1.1 from the Linux Foundation
W. Trevor King [Fri, 24 Jan 2014 23:46:30 +0000 (15:46 -0800)]
developer-certificate-of-origin: Add v1.1 from the Linux Foundation

Luis R. Rodriguez [1] has been trying to get information about the
licensing of the Linux kernel's DCO [2] for a while now [3,4], and it
looks like the Linux Foundation just made their view explicit [5,6,7].
This clarifies the copyright and licensing of Linus' two patches:

857a183 Update DCO ("signoff") rules to 1.1
991bd2e Start documenting the sign-off procedure in SubmittingPatches

From the whois information, developercertificate.org is pretty recent:

  $ whois developercertificate.org
  Domain Name: DEVELOPERCERTIFICATE.ORG
  Domain ID: D170689185-LROR
  Creation Date: 2014-01-15T02:54:55Z
  Updated Date: 2014-01-17T22:11:12Z
  ...
  Name Server: NS2.LINUX-FOUNDATION.ORG
  Name Server: NS1.LINUX-FOUNDATION.ORG

Now that this has an upstream source and it's own license (verbatim
copies only), I'm putting this file in its own branch.  Downloaded
just now:

  $ wget -S -O developer-certificate-of-origin http://developercertificate.org/
  --2014-01-24 15:46:21--  http://developercertificate.org/
  Resolving developercertificate.org... 140.211.169.4
  Connecting to developercertificate.org|140.211.169.4|:80... connected.
  HTTP request sent, awaiting response...
    HTTP/1.1 200 OK
    Server: nginx
    Date: Fri, 24 Jan 2014 23:46:58 GMT
    Content-Type: text/html; charset=UTF-8
    Connection: keep-alive
    Last-Modified: Fri, 17 Jan 2014 23:02:25 GMT
    ETag: "5c188d-6c5-4f0328910e8f0"
    Accept-Ranges: bytes
    Content-Length: 1733
  Length: 1733 (1.7K) [text/html]
  Saving to: ‘developer-certificate-of-origin’

  2014-01-24 15:46:21 (112 MB/s) - ‘developer-certificate-of-origin’ saved [1733/1733]

After which I stripped out the HTML, leaving just the DCO text.

[1]: http://www.do-not-panic.com/
[2]: https://www.kernel.org/doc/Documentation/SubmittingPatches
[3]: http://thread.gmane.org/gmane.linux.kernel/1397613
[4]: http://thread.gmane.org/gmane.linux.kernel/1492612
[5]: http://article.gmane.org/gmane.linux.kernel.wireless.general/118696
[6]: http://article.gmane.org/gmane.linux.kernel/1635433
[7]: http://developercertificate.org/

Signed-off-by: W. Trevor King <wking@tremily.us>
10 years agoMerge branch 'trustlink'
W. Trevor King [Sat, 18 Jan 2014 19:58:49 +0000 (11:58 -0800)]
Merge branch 'trustlink'

* trustlink:
  CHANGELOG: Document this branch
  config and feed: Added trust-link preference

Signed-off-by: W. Trevor King <wking@tremily.us>
10 years agoCHANGELOG: Document this branch
W. Trevor King [Sat, 18 Jan 2014 19:46:48 +0000 (11:46 -0800)]
CHANGELOG: Document this branch

And add a warning about toggling the setting for active feeds.

For both George and me, the motivation for this change was working
around feed authors that change the id after minor changes in content:

  On Sat, Jan 18, 2014 at 1:40 PM, W. Trevor King wrote:
  > Some of the newspaper feeds I follow have duplicate entries in
  > their feed if they tweaked the title or content, but I rarely care
  > about the changes.

  On Sat, Jan 18, 2014 at 02:16:19PM -0500, George Saunders wrote:
  > That's exactly the situation I added it for.

The Atom spec explicitly says that revisions should keep the same id
[1]:

  When an Atom Document is relocated, migrated, syndicated,
  republished, exported, or imported, the content of its atom:id
  element MUST NOT change.  Put another way, an atom:id element
  pertains to all instantiations of a particular Atom entry or feed;
  revisions retain the same content in their atom:id elements.

But not all feed generators are fully compliant ;).

[1]: http://tools.ietf.org/search/rfc4287#section-4.2.6

Signed-off-by: W. Trevor King <wking@tremily.us>
10 years agoBump to version 3.8 v3.8
W. Trevor King [Sat, 18 Jan 2014 18:45:48 +0000 (10:45 -0800)]
Bump to version 3.8

Signed-off-by: W. Trevor King <wking@tremily.us>
10 years agoconfig and feed: Added trust-link preference
George Saunders [Fri, 22 Mar 2013 04:49:27 +0000 (04:49 +0000)]
config and feed: Added trust-link preference

The trust-link preference allows the user to ignore feed
entries that repeat a previously seen link URL.

Signed-off-by: George Saunders <georgesaunders@gmail.com>
11 years agoMerge branch 'opmlimport-feed-name-slugging'
W. Trevor King [Sun, 20 Oct 2013 21:54:33 +0000 (14:54 -0700)]
Merge branch 'opmlimport-feed-name-slugging'

* opmlimport-feed-name-slugging:
  CHANGELOG: Document this branch
  feed: Adjust Feed._name_regexp to allow non-ASCII characters
  command: Sluggify feed names on opmlimport

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoCHANGELOG: Document this branch
W. Trevor King [Sun, 20 Oct 2013 21:52:19 +0000 (14:52 -0700)]
CHANGELOG: Document this branch

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agofeed: Adjust Feed._name_regexp to allow non-ASCII characters
W. Trevor King [Sun, 13 Oct 2013 22:18:06 +0000 (15:18 -0700)]
feed: Adjust Feed._name_regexp to allow non-ASCII characters

There's no need to restrict folks to the Latin alphabet.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agocommand: Sluggify feed names on opmlimport
W. Trevor King [Sun, 13 Oct 2013 21:54:29 +0000 (14:54 -0700)]
command: Sluggify feed names on opmlimport

Gaëtan Harter writes [1]:
> Importing the following opml file fails with `invalid feed name
> 'Arch Linux: Recent news updates`
>
>   <?xml version="1.0" encoding="UTF-8"?>
>   <opml version="1.0">
>     <head>
>       <title>Google reader export</title>
>     </head>
>     <body>
>       <outline text="Arch Linux: Recent news updates"
>                title="Arch Linux: Recent news updates" type="rss"
>                xmlUrl="http://www.archlinux.org/feeds/news/"
>                htmlUrl="https://www.archlinux.org/news/" />
>     </body>
>   </opml>
>
> It fails because the `text` field is used directly as `name` for
> creating a Feed object.

ConfigParser can handle colons and accented characters in their
section names [2], but Feed._set_name checks names against
Feed._name_regexp which only allows ASCII letters, digits, periods,
underscores, and the hyphen-minus (U+002D).  Add an inverse
name_slug_regexp to opmlimport that replaces any runs of illegal
characters with a single hyphen-minus, to avoid crashing if the text
attribute contains anything illegal.

[1]: https://github.com/wking/rss2email/issues/24#issuecomment-26224593
[2]: http://docs.python.org/3/library/configparser.html#supported-ini-file-structure

Reported-by: Gaëtan Harter <hartergaetan@gmail.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoMANIFEST.in: Add the AUTHORS file for distribution
W. Trevor King [Fri, 11 Oct 2013 17:11:24 +0000 (10:11 -0700)]
MANIFEST.in: Add the AUTHORS file for distribution

Reported-by: Arun Persaud <apersaud@lbl.gov>
Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoBump to version 3.7 v3.7
W. Trevor King [Fri, 11 Oct 2013 15:30:12 +0000 (08:30 -0700)]
Bump to version 3.7

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agosetup.py: Claim support for Python 3.3
W. Trevor King [Fri, 11 Oct 2013 15:44:56 +0000 (08:44 -0700)]
setup.py: Claim support for Python 3.3

We've supported 3.3 for the whole rss2email 3.x branch, but I forgot
to mention it in the trove classifiers.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoREADME: Three-space indents for nested enumerated lists
W. Trevor King [Fri, 11 Oct 2013 15:39:11 +0000 (08:39 -0700)]
README: Three-space indents for nested enumerated lists

Apparently two spaces doesn't cut it.  This change fixes:

  $ rst2html.py --strict README
  README:5: (INFO/1) Enumerated list start value not ordinal-1: "3" (ordinal 3)
  Exiting due to level-1 (INFO) system message.

I also use a bullet list for dependencies, because the order in which
you install them doesn't matter to me.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoMerge branch 'catch-html-parse-error'
W. Trevor King [Fri, 11 Oct 2013 15:02:48 +0000 (08:02 -0700)]
Merge branch 'catch-html-parse-error'

* catch-html-parse-error:
  CHANGELOG: Document this branch's HTML-title conversion fallback
  feed: Add 'default' argument to Feed._html2text for HTMLParseError

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoCHANGELOG: Document this branch's HTML-title conversion fallback
W. Trevor King [Fri, 11 Oct 2013 15:01:51 +0000 (08:01 -0700)]
CHANGELOG: Document this branch's HTML-title conversion fallback

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoMerge branch 'robust-file-saving'
W. Trevor King [Fri, 11 Oct 2013 14:56:15 +0000 (07:56 -0700)]
Merge branch 'robust-file-saving'

* robust-file-saving:
  CHANGELOG: Document this branch's atomic saves
  feeds: Make Feeds.save fully atomic, assuming a working fsync

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoCHANGELOG: Document this branch's atomic saves
W. Trevor King [Fri, 11 Oct 2013 14:54:21 +0000 (07:54 -0700)]
CHANGELOG: Document this branch's atomic saves

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agofeed: Add 'default' argument to Feed._html2text for HTMLParseError
W. Trevor King [Sat, 28 Sep 2013 16:51:03 +0000 (09:51 -0700)]
feed: Add 'default' argument to Feed._html2text for HTMLParseError

This allows us to easily fall back on an unconverted string in the
event that the input HTML is malformed.  We already caught
HTMLParseError when converting HTML to plain test for non-html mail,
but we didn't catch it in Feed._get_entry_title.  Now we gracefully
handle the situation by treating the malformed HTML as plain text.

11 years agofeeds: Make Feeds.save fully atomic, assuming a working fsync
W. Trevor King [Sat, 28 Sep 2013 16:01:52 +0000 (09:01 -0700)]
feeds: Make Feeds.save fully atomic, assuming a working fsync

If the disk is full (or there are other OS-level issues), a file may
not be completely written to the disk.

The write-flush-fsync-rename sequence is much safer.  The fsync
invocation matches the recommendation in the docs [1]:

  If you’re starting with a buffered Python file object f, first do
  f.flush(), and then do os.fsync(f.fileno()), to ensure that all
  internal buffers associated with f are written to disk.

The purpose of each step is:

* write: move the data into a library buffer
* flush: flush the library buffer into a kernel buffer
* fsync: flush the kernel buffer onto the disk at $tempfile
* rename: adjust the metadata so that the $filename points to the
    $tempfile data, release the old data

This means that if the rename works we get the new data, and if the
rename fails we still have the old data.

However, POSIX's fsync is implementation defined unless
_POSIX_SYNCHRONIZED_IO is defined [3,4], and some OS X implementations
go the no-op route, as Stewart Smith points out in his excellent "Eat
My Data: How everybody gets file I/O wrong" [4].  If you want to run
rss2email on such a system, verifying your data integrity is up to you
;).

We used to write-rename the data file (but not the config) on *nix
[2].  Now we do the full write-flush-fsync-rename for both the config
and data files on both *nix and other systems.

[1]: http://docs.python.org/3/library/os.html#os.fsync
[2]: For rss2email, *nix is "has fcntl, but isn't SunOS
[3]: http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html
[4]: https://www.flamingspork.com/talks/2007/06/eat_my_data.odp

Reported-by: Etienne Millon <me@emillon.org>
Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agor2e.1: Remove quotes around 'name-format' value
W. Trevor King [Fri, 20 Sep 2013 07:41:00 +0000 (00:41 -0700)]
r2e.1: Remove quotes around 'name-format' value

ConfigParser doesn't need quoting around string values, so if you use
quotes they will show up explicitly:

  From: "'indexed: Jessica Hagy'" <user@rss2email.invalid>

That's probably not what you want ;).

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoRun update-copyright.py
W. Trevor King [Sat, 14 Sep 2013 16:58:11 +0000 (09:58 -0700)]
Run update-copyright.py

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoMerge branch 'format-entry-name'
W. Trevor King [Sat, 14 Sep 2013 16:57:12 +0000 (09:57 -0700)]
Merge branch 'format-entry-name'

* format-entry-name:
  feed: Give defaults for _get_entry_name formatting data
  feed: Convert 'friendly-name' setting to 'name-format'

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agofeed: Give defaults for _get_entry_name formatting data
W. Trevor King [Sat, 14 Sep 2013 16:49:56 +0000 (09:49 -0700)]
feed: Give defaults for _get_entry_name formatting data

We don't want to crash if the source feed is missing some data that
the user expects, or if the user just hasn't had the time to adjust
the name-format config.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoMerge branch 'no-prefered-xml-parser'
W. Trevor King [Sat, 14 Sep 2013 15:47:16 +0000 (08:47 -0700)]
Merge branch 'no-prefered-xml-parser'

* no-prefered-xml-parser:
  feed: Disable feedparser's PREFERRED_XML_PARSERS

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoMerge branch 'accept-feedparser-encoding-override'
W. Trevor King [Sat, 14 Sep 2013 15:44:32 +0000 (08:44 -0700)]
Merge branch 'accept-feedparser-encoding-override'

* accept-feedparser-encoding-override:
  CHANGELOG: Demote guessed encodings logs from 'error' to 'warning'
  feed: don't emit error if parser able to auto-determine encoding

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoCHANGELOG: Demote guessed encodings logs from 'error' to 'warning'
W. Trevor King [Sat, 14 Sep 2013 15:41:31 +0000 (08:41 -0700)]
CHANGELOG: Demote guessed encodings logs from 'error' to 'warning'

Documenting:
05f2628 feed: don't emit error if parser able to auto-determine encoding

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agofeed: don't emit error if parser able to auto-determine encoding
J. Lewis Muir [Tue, 10 Sep 2013 16:48:06 +0000 (11:48 -0500)]
feed: don't emit error if parser able to auto-determine encoding

Some feeds have incorrectly declared encodings (e.g. the encoding
specified by the HTTP header does not match the encoding specified in
the XML declaration).  For such a feed, "r2e run" would emit an error
message similar to the following:

  processing error: document declared as us-ascii, but parsed as
  iso-8859-1: undeadly (http://undeadly.org/cgi?action=rss ->
  jlmuir@imca-cat.org)

In this particular case, the HTTP header indicated a content type of
"text/xml" with no "charset" parameter.  According to the feedparser
5.1.3 documentation (section "Introduction to Character Encoding" [1]),
this results in an encoding of US-ASCII.  But the served XML document
contains an encoding declaration of ISO-8859-1.

For this case and some others, feedparser is able to automatically
determine an encoding.  When it does, we emit a warning rather than an
error, and accept the automatically determined encoding.

We check for a successfully overridden encoding by looking at the bozo
bit and the bozo_exception.  If the bozo bit is set and the
bozo_exception is feedparser.CharacterEncodingOverride, the parser has
successfully overridden an incorrectly declared encoding.  Quoting from
the feedparser 5.1.3 documentation, section "Handling
Incorrectly-Declared Encodings" [2]:

  Universal Feed Parser initially uses the rules specified in RFC 3023
  to determine the character encoding of the feed. If parsing succeeds,
  then that's that. If parsing fails, Universal Feed Parser sets the
  bozo bit to 1 and sets bozo_exception to
  feedparser.CharacterEncodingOverride. Then it tries to reparse the
  feed with the following character encodings:

  1. the encoding specified in the XML declaration
  2. the encoding sniffed from the first four bytes of the document (as
     per Section F)
  3. the encoding auto-detected by the Universal Encoding Detector, if
     installed
  4. utf-8
  5. windows-1252

  If the character encoding can not be determined, Universal Feed Parser
  sets the bozo bit to 1 and sets bozo_exception to
  feedparser.CharacterEncodingUnknown. In this case, parsed values will
  be strings, not Unicode strings.

References:

1. http://pythonhosted.org/feedparser/character-encoding.html#introduction-to-character-encoding
2. http://pythonhosted.org/feedparser/character-encoding.html#handling-incorrectly-declared-encodings

Signed-off-by: J. Lewis Muir <jlmuir@imca-cat.org>
11 years agofeed: Convert 'friendly-name' setting to 'name-format'
W. Trevor King [Tue, 10 Sep 2013 19:04:05 +0000 (12:04 -0700)]
feed: Convert 'friendly-name' setting to 'name-format'

In Debian bug 722009, Joey Hess wrote about the 2.x series [1]:
> The current From line generated by r2e is From: Blog: Author
>   where
>     Blog is the name of the blog, or some page like a wiki's RecentChanges
>     Author is the author of a post
>       iff the blog sets that info in the feed
>
> In mutt the Blog part often occupies the whole displayed Subject field,
> which is fixed width. So the Author cannot be seen. This is particularly
> a problem with planets, where the author of a post matters a lot.
> But also with some blogs that have multiple authors.
>
> For these sorts of blogs, I would generally prefer to use a From line
> like From: Author (Blog)
> This does mean that when sorting by author, all posts of a blog or
> planet feed may not appear together, but that would be an acceptable
> tradeoff to me.
>
> One way to implement this (other than just changing the format string)
> would be to make OVERRIDE_FROM able to contain a format string,
> so it could be configured on a per-feed basis.

The new setup makes the name-formatting configurable on a per feed
basis (what Joey wanted), but it's not without side effects.  For
feeds where some information is missing (feed-title, author, or
publisher), we used to adjust the formatting on the fly.  For example,
you'd get output like '{author}' if the feed-title was missing,
instead of getting ': {author}'.  Now users bothered by this will have
to manually override the format template for feeds missing crucial
data.

[1]: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=722009

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agofeed: Disable feedparser's PREFERRED_XML_PARSERS
W. Trevor King [Tue, 10 Sep 2013 18:41:12 +0000 (11:41 -0700)]
feed: Disable feedparser's PREFERRED_XML_PARSERS

Feedparser's default parser (drv_libxml2) has trouble parsing byte
streams in Python 3:

  $ python -c 'import rss2email.feed; import doctest; doctest.testmod(rss2email.feed)'
  ...
  File "rss2email/feed.py", line 319, in rss2email.feed.Feed._fetch
Failed example:
    parsed = feed._fetch()
Exception raised:
    Traceback (most recent call last):
      File "rss2email/util.py", line 61, in run
        self.result = self._target(*self._args, **self._kwargs)
      File "/.../feedparser/feedparser.py", line 3745, in parse
        saxparser.parse(source)
      File "/usr/lib64/python3.2/site-packages/drv_libxml2.py", line 270, in parse
        _d(reader.Name()),_d(reader.Value()))
      File "/usr/lib64/python3.2/site-packages/drv_libxml2.py", line 70, in _d
        return _decoder(s)[0]
      File "/usr/lib64/python3.2/encodings/utf_8.py", line 16, in decode
        return codecs.utf_8_decode(input, errors, True)
    TypeError: 'str' does not support the buffer interface

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "/usr/lib64/python3.2/doctest.py", line 1288, in __run
        compileflags, 1), test.globs)
      File "<doctest rss2email.feed.Feed._fetch[1]>", line 1, in <module>
        parsed = feed._fetch()
      File "rss2email/feed.py", line 336, in _fetch
        return f(self.url, self.etag, modified=self.modified, **kwargs)
      File "rss2email/util.py", line 76, in __call__
        time_limited_function=self) from self.error[1]
    rss2email.error.TimeoutError: error while running time limited function: 'str' does not support the buffer interface
  ...

You can reproduce the underlying exception with this minimal script:

  import io
  import xml.sax
  import xml.sax.handler

  data = b'<feed xmlns="http://www.w3.org/2005/Atom"><entry><author><name>Example author</name><email>me@example.com</email><url>http://example.com/</url></author></entry></feed>'

  source = xml.sax.xmlreader.InputSource()
  source.setByteStream(io.BytesIO(data))
  saxparser = xml.sax.make_parser(["drv_libxml2"])
  saxparser.setContentHandler(xml.sax.handler.ContentHandler())
  saxparser.parse(source)

which raises:

  Traceback (most recent call last):
    File "<stdin>", line 13, in <module>
      saxparser.parse(source)
    File "/usr/lib64/python3.2/site-packages/drv_libxml2.py", line 222, in parse
      eltName = _d(reader.Name())
    File "/usr/lib64/python3.2/site-packages/drv_libxml2.py", line 70, in _d
      return _decoder(s)[0]
    File "/usr/lib64/python3.2/encodings/utf_8.py", line 16, in decode
      return codecs.utf_8_decode(input, errors, True)
  TypeError: 'str' does not support the buffer interface

at least for libxml2-2.9.1.

By using the stdlib's default parser (instead of drv_libxml2), we can
avoid the error and get successful parsing.  If you don't have
drv_libxml2 installed, sax was already falling back on the stdlib's
default parser, so this commit will be a no-op.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoBump to version 3.6 v3.6
W. Trevor King [Mon, 9 Sep 2013 17:30:17 +0000 (10:30 -0700)]
Bump to version 3.6

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoCHANGELOG: Document missing port argument for IMAPAuthenticationError fix
W. Trevor King [Mon, 9 Sep 2013 17:28:18 +0000 (10:28 -0700)]
CHANGELOG: Document missing port argument for IMAPAuthenticationError fix

Documenting:
9124d62 email: fixed missing port argument for IMAPAuthenticationError

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoemail: fixed missing port argument for IMAPAuthenticationError
Arun Persaud [Mon, 5 Aug 2013 19:13:07 +0000 (12:13 -0700)]
email: fixed missing port argument for IMAPAuthenticationError

Signed-off-by: Arun Persaud <apersaud@lbl.gov>
11 years agotest/gmane/3: Add tests for HTML generation
W. Trevor King [Mon, 10 Jun 2013 20:37:05 +0000 (16:37 -0400)]
test/gmane/3: Add tests for HTML generation

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agotest/gmane/2.expected: Remove some whitespace
W. Trevor King [Mon, 10 Jun 2013 20:30:35 +0000 (16:30 -0400)]
test/gmane/2.expected: Remove some whitespace

I'm not sure which version of html2text I used to generate the initial
expected results, but this version was generated with html2text
3.200.3.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agotest/test.py: Update clean_result() to normalize user agents
W. Trevor King [Mon, 10 Jun 2013 20:25:27 +0000 (16:25 -0400)]
test/test.py: Update clean_result() to normalize user agents

Catch up with the USER_AGENT changes from 3f9adb5 (feed: Add the
digest setting for multi-entry email, 2013-04-13).

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoCHANGELOG: Document HTML syntax fix
W. Trevor King [Sun, 9 Jun 2013 21:26:31 +0000 (17:26 -0400)]
CHANGELOG: Document HTML syntax fix

Documenting:
4aa7f1d Fixed syntactical error when generating HTML mails

The commit by Dennis Keitzel fixed a typo in the HTML which dated back
to 00e2eecc (Spread cmd_run() logic out into Feed methods,
2012-10-04).

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoFixed syntactical error when generating HTML mails
Dennis Keitzel [Sat, 8 Jun 2013 09:32:55 +0000 (11:32 +0200)]
Fixed syntactical error when generating HTML mails

Signed-off-by: Dennis Keitzel <github@pinshot.net>
11 years agoBump to version 3.5 v3.5
W. Trevor King [Wed, 5 Jun 2013 22:13:14 +0000 (18:13 -0400)]
Bump to version 3.5

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoCHANGELOG: Mention the digest addition
W. Trevor King [Wed, 5 Jun 2013 22:09:52 +0000 (18:09 -0400)]
CHANGELOG: Mention the digest addition

Documenting:
92c0e76 Merge branch 'digest'

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoMerge branch 'digest'
W. Trevor King [Wed, 5 Jun 2013 22:08:47 +0000 (18:08 -0400)]
Merge branch 'digest'

* digest:
  feed: Add the digest-post-process setting
  feed: Add the digest setting for multi-entry email

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agocommand: Add newlines to OPML export
W. Trevor King [Tue, 14 May 2013 11:57:50 +0000 (07:57 -0400)]
command: Add newlines to OPML export

No semantic change, but it makes the exported data easier for humans
to read.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agocommand: Use feed names in OPML 'text' attributes
W. Trevor King [Tue, 14 May 2013 11:37:33 +0000 (07:37 -0400)]
command: Use feed names in OPML 'text' attributes

Instead of writing the URL as the 'text' attribute and ignoring it on
read, we now use the attribute to store the feed name.  This avoids
auto-generated feed names on import.  From the OPML 2.0 spec [1]:

  Subscription lists
  ...
  Required attributes: type, text, xmlUrl. For outline elements whose
  type is rss, the text attribute should initially be the top-level
  title element in the feed being pointed to, however since it is
  user-editable, processors should not depend on it always containing
  the title of the feed. xmlUrl is the http address of the feed.

We are not following the 'should' recommendation, but since we have
user-generated titles, I believe that the new usage is appropriate.
It's certainly closer to spec than storing a URL in 'text' :p.

[1]: http://dev.opml.org/spec2.html

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agocommand: Fix opmlexport crash due to orphaned feed data
W. Trevor King [Tue, 14 May 2013 11:03:32 +0000 (07:03 -0400)]
command: Fix opmlexport crash due to orphaned feed data

When you remove a feed from your config file by hand, you might leave
the dynamic 'seen' data in the JSON data file by accident.  If you
have such orphan data, the feed is loaded by Feeds._load_feeds() with
the default configuration (since you removed the config file entry).
This can lead to opmlexport errors like:

  Traceback (most recent call last):
    File "./r2e", line 5, in <module>
      rss2email.main.run()
    File "/.../rss2email/rss2email/main.py", line 163, in run
      args.func(feeds=feeds, args=args)
    File "/.../rss2email/rss2email/command.py", line 157, in opmlexport
      url = _saxutils.escape(feed.url)
    File "/usr/lib64/python3.2/xml/sax/saxutils.py", line 34, in escape
      data = data.replace("&", "&amp;")
  AttributeError: 'NoneType' object has no attribute 'replace'

because the feeds lack the per-feed 'url' setting that had been
defined in the config file.  With this commit, opmlexport drops these
URL-less feeds, instead of choking to death trying to format them ;).

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoREADME: Link to Fedora package
W. Trevor King [Tue, 14 May 2013 10:18:22 +0000 (06:18 -0400)]
README: Link to Fedora package

Also remove 'Linux' from distribution names.  The goal is to point
folks using $DISTRO to their package, not to give details about the
technical underpinnings of a given distribution.

I've sorted the distributions by packaging format: deb, rpm, ebuild,
and Makefile.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoconfig: Replace Config._setup() with Config.setup_html2text()
W. Trevor King [Tue, 14 May 2013 08:38:07 +0000 (04:38 -0400)]
config: Replace Config._setup() with Config.setup_html2text()

Since 7bbbf62 (Setup html2text in Config._setup(), 2012-10-04), the
html2text configuration options have only been referenced from
Config._setup(), and that method was never called.  With this commit,
we rename the method to setup_html2text(), and add a new
Feed._html2text() which invokes the setup before calling
html2text.html2text().

This caused a fair amount of churn in the expected test results, as
previously ignored default values for html2text kicked in.  I also
added a test exercising the non-default values (allthingsrss/3), and
it looks like everything works as expected.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoBump to version 3.4 v3.4
W. Trevor King [Tue, 14 May 2013 07:47:22 +0000 (03:47 -0400)]
Bump to version 3.4

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoemail: Pass 'config' and 'section' through from send() to *_send()
W. Trevor King [Sat, 11 May 2013 10:52:12 +0000 (06:52 -0400)]
email: Pass 'config' and 'section' through from send() to *_send()

Otherwise only the default configuration will be used, which is almost
certainly not what the user wants.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoCHANGELOG: Update with summary of IMAP delivery addition
W. Trevor King [Fri, 10 May 2013 12:38:01 +0000 (08:38 -0400)]
CHANGELOG: Update with summary of IMAP delivery addition

Documenting:
a7f2222 Merge branch 'imap'

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoMerge branch 'imap'
W. Trevor King [Fri, 10 May 2013 12:33:09 +0000 (08:33 -0400)]
Merge branch 'imap'

* imap:
  email: small fixes for using imap as a backend
  email: Stub out send_imap()

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoemail: Remove debugging logging from _decode_header()
W. Trevor King [Fri, 10 May 2013 12:26:01 +0000 (08:26 -0400)]
email: Remove debugging logging from _decode_header()

This was accidentally committed in e08e198 (email: Decode headers when
checking .as_string() flatten fallback, 2013-02-17).

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agofeed: Add the digest-post-process setting
W. Trevor King [Fri, 10 May 2013 09:36:25 +0000 (05:36 -0400)]
feed: Add the digest-post-process setting

For users that want to manipulate the multi-entry message.  For
example, if the stock `digest for <feed.name>` title doesn't cut it
for you, you can now use a post-processing hook to set the title
however you like.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agofeed: Add the digest setting for multi-entry email
W. Trevor King [Sat, 13 Apr 2013 22:05:03 +0000 (18:05 -0400)]
feed: Add the digest setting for multi-entry email

For high-volume feeds, some users want to receive a single email per
Feed.run() instead of a separate email for each new entry in the feed.
If you enable the new digest setting, the per-entry messages are
packed into a single multipart/digest message instead of being mailed
individually.  The MIME details for digests are spelled out in RFC
2046 [1].

Peripheral changes:
* Added rss2email.feed._USER_AGENT, to get version information into
  the User-Agent message headers and to avoid repeating myself.
* Normalize multipart MIME boundaries for easier testing of
  multipart/digest messages.

[1]: http://tools.ietf.org/html/rfc2046#section-5.1.5

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoRun update-copyright.py
W. Trevor King [Fri, 10 May 2013 09:16:02 +0000 (05:16 -0400)]
Run update-copyright.py

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years ago.update-copyright.conf: Use aliases to remove Aaron Swartz's email
W. Trevor King [Fri, 10 May 2013 09:14:21 +0000 (05:14 -0400)]
.update-copyright.conf: Use aliases to remove Aaron Swartz's email

Not much use in sending email to the deceased :(.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoCHANGELOG: Update with summary of post-process addition
W. Trevor King [Fri, 10 May 2013 08:58:28 +0000 (04:58 -0400)]
CHANGELOG: Update with summary of post-process addition

Documenting:
3a331d6 Merge branch 'post-process'

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoMerge branch 'post-process'
W. Trevor King [Fri, 10 May 2013 08:58:06 +0000 (04:58 -0400)]
Merge branch 'post-process'

* post-process:
  rss2email/post_process/downcase.py: Move my test hook into Arun's directory
  post_process: add documentation and a prettify example
  Add configurable post-process hooks

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agorss2email/post_process/downcase.py: Move my test hook into Arun's directory
W. Trevor King [Fri, 10 May 2013 08:39:33 +0000 (04:39 -0400)]
rss2email/post_process/downcase.py: Move my test hook into Arun's directory

All the built-in hooks should live in the same sub-package.  The
`post_process` name Arun used is more descriptive than my `hook`, so
move my downcase code there.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoREADME: Update the link to the NetBSD package
W. Trevor King [Fri, 19 Apr 2013 11:38:28 +0000 (07:38 -0400)]
README: Update the link to the NetBSD package

Although they're still packaging rss2email-2.71nb2.  I'll see if I can
dig up a maintainer to ping about that.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agopost_process: add documentation and a prettify example
Arun Persaud [Mon, 15 Apr 2013 21:49:52 +0000 (14:49 -0700)]
post_process: add documentation and a prettify example

* also mention it in the README file.
* package the filter via setup.py

Signed-off-by: Arun Persaud <apersaud@lbl.gov>
11 years agor2e.1: Properly escape an ellipsis in the sample configuration
W. Trevor King [Mon, 15 Apr 2013 23:20:51 +0000 (19:20 -0400)]
r2e.1: Properly escape an ellipsis in the sample configuration

Following the example from nroff.1, which uses:

  .RI [ file\~ .\|.\|.]

For reasons that I haven't bothered to track down, the ellipsis isn't
rendered correctly when it occurs at the beginning of a line (even
with the `\|` separators).  After adding some leading whitespace,
everything seems to be working fine.

Reported-by: Matěj Cepl <mcepl@redhat.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoBump to version 3.3 v3.3
W. Trevor King [Sat, 13 Apr 2013 23:30:28 +0000 (19:30 -0400)]
Bump to version 3.3

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoCHANGELOG: Update with summary of <table> removal
W. Trevor King [Sat, 13 Apr 2013 23:27:30 +0000 (19:27 -0400)]
CHANGELOG: Update with summary of <table> removal

Documenting:
d293ab8 feed: Remove <table> elements from HTML mail

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoAdd configurable post-process hooks
W. Trevor King [Tue, 22 Jan 2013 16:59:36 +0000 (11:59 -0500)]
Add configurable post-process hooks

On Mon, Jan 21, 2013 at 11:16:58PM -0800, Arun Persaud wrote:
> but I was wondering if there is any chance to add some hooks, so
> that the user can modify the feed before it gets send, something
> that takes the url, uid, and other interesting information and
> returns the body of the feed that should get emailed.

This is not quite what he asked for (e.g., I don't pass the URL
explicitly, the hook should return the full message instead of just
payload, ...), but I think it get's the job done.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agofeed: Remove <table> elements from HTML mail
W. Trevor King [Tue, 9 Apr 2013 19:30:11 +0000 (15:30 -0400)]
feed: Remove <table> elements from HTML mail

These were not semantically correct ;).  Based on a patch by Rui Carmo
[1].

[1]: https://github.com/rcarmo/rss2email/commit/2a015bce9d701035b9af874bd56c46f92382e668

Based-on-patch-by: Rui Carmo <rui.carmo@gmail.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoREADME: Bump example snapshot versions from 2.71 to 3.2
W. Trevor King [Fri, 5 Apr 2013 23:12:00 +0000 (19:12 -0400)]
README: Bump example snapshot versions from 2.71 to 3.2

Now that we have releases in the 3.x line, we should be pointing users
in that direction.  These version numbers should probably be bumped
with each release :(.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoemail: small fixes for using imap as a backend
Arun Persaud [Fri, 5 Apr 2013 19:04:35 +0000 (12:04 -0700)]
email: small fixes for using imap as a backend

* fixed two typos in "def send"
* removed some unecessary calls to imap.connect and
  imap.close (which seems to be only needed in case you open
  a mailbox, which we don't)

Signed-off-by: Arun Persaud <apersaud@lbl.gov>
11 years agoemail: Stub out send_imap()
W. Trevor King [Thu, 28 Mar 2013 10:51:31 +0000 (06:51 -0400)]
email: Stub out send_imap()

Arun Persaud suggested IMAP as an additional email delivery mechanism.
The benefit of using IMAP over SMTP is that you can set the target
mailbox directly (instead of filtering the incoming mail with procmail
or a similar external tool).  This commit restructures the 'send'
configuration to support IMAP output with a configurable mailbox.
That means you can do something like:

  [DEFAULT]
  email-protocol: imap
  imap-auth: True
  imap-username: myname
  imap-password: mypass
  imap-server: imap.yourisp.net
  imap-port: 993
  imap-ssl: True

  [feed.rss2email]
  url = http://www.allthingsrss.com/rss2email/feed/
  imap-mailbox = rss2email

  [feed.xkcd]
  url = http://xkcd.com/atom.xml
  imap-mailbox = xkcd

For non-IMAP users, note that the boolean `use-smtp` configuration
variable is gone, replaced by the more flexible `email-protocol`.
You'll want to replace:

  use-smtp = False

with:

  email-protocol = sendmail

and replace:

  use-smtp = True

with:

  email-protocol = smtp

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoCHANGELOG: Update with summaries of recent changes
W. Trevor King [Thu, 28 Mar 2013 10:58:40 +0000 (06:58 -0400)]
CHANGELOG: Update with summaries of recent changes

This documents the following changes:
5fd97a2 Add George Saunders to __contributors__
bd7b7ca error: Don't explicitly store server in SMTPAuthenticationError
3f5df62 feed: Add herror to _SOCKET_ERRORS and remove reason handing
c39625e error: Fix ProcessingError message and logging
a3719f8 feed: Catch parsing errors during html2text
a88738f error: Fix super calls for SMTPAuthenticationError, etc.
80a8edf email: Change stray SMTP_SERVER to server
a66dd58 email: Remove explicit ehlo() call
c9f5681 feed: Streamline rel-via title extraction in
  _process_entry_content()
aa8675d feed: Drop Google Reader rel-via manipulation
a226ef6 error: Fix inheritance typos for HTTPError (ProcessingError ->
  FeedError)

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoAdd George Saunders to __contributors__
W. Trevor King [Thu, 28 Mar 2013 10:53:22 +0000 (06:53 -0400)]
Add George Saunders to __contributors__

For his 80a8edf (email: Change stray SMTP_SERVER to server,
2013-03-18).

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoerror: Don't explicitly store server in SMTPAuthenticationError
W. Trevor King [Thu, 28 Mar 2013 01:17:03 +0000 (21:17 -0400)]
error: Don't explicitly store server in SMTPAuthenticationError

It's already being stored by SMTPConnectionError, which is called via
super().

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agofeed: Add herror to _SOCKET_ERRORS and remove reason handing
W. Trevor King [Thu, 21 Mar 2013 15:07:57 +0000 (11:07 -0400)]
feed: Add herror to _SOCKET_ERRORS and remove reason handing

We don't log the reason, so trying to extract it just gives room for
errors to creep in.  Luckily, we'll be able to drop the whole
_SOCKET_ERRORS thing when we move to Python >= 3.3, because following
PEP 3151 the socket errors became subclasses of OSError.

Reported-by: Matt Bordignon <matthew@bordignons.net>
Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoerror: Fix ProcessingError message and logging
W. Trevor King [Wed, 20 Mar 2013 09:40:32 +0000 (05:40 -0400)]
error: Fix ProcessingError message and logging

We can't check if message is None if message wasn't an argument to
__init__().  Also:

* import sys for sys.version
* explicitly format strings passed to _LOG.warning(), otherwise you'll
  get the following:

    >>> LOG.warning('abc', 'def')
    Traceback (most recent call last):
      ...
    TypeError: not all arguments converted during string formatting

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agofeed: Catch parsing errors during html2text
W. Trevor King [Wed, 20 Mar 2013 09:27:03 +0000 (05:27 -0400)]
feed: Catch parsing errors during html2text

This avoids crashing with:

  Traceback (most recent call last):
    ...
    File ".../rss2email/feed.py", line 732, in _process_entry_content
      lines = [_html2text.html2text(content['value'])]
    ...
    File "/usr/lib/python3.2/html/parser.py", line 149, in error
      raise HTMLParseError(message, self.getpos())
  html.parser.HTMLParseError: EOF in middle of construct, at line 1, column 262

The troublesome feed was:

  $ wget -S http://www.cell.com/rssFeed/biophysj/rss.NewIssueAndArticles.xml
  --2013-03-20 05:22:08--  http://www.cell.com/rssFeed/biophysj/rss.NewIssueAndArticles.xml
  Resolving www.cell.com... 145.36.42.28
  Connecting to www.cell.com|145.36.42.28|:80... connected.
  HTTP request sent, awaiting response...
    HTTP/1.1 200 OK
    Date: Wed, 20 Mar 2013 09:23:19 GMT
    Server: IBM_HTTP_Server
    Last-Modified: Tue, 19 Mar 2013 22:00:04 GMT
    Accept-Ranges: bytes
    Content-Length: 15362
    Vary: Accept-Encoding
    Keep-Alive: timeout=10, max=100
    Connection: Keep-Alive
    Content-Type: text/xml
  Length: 15362 (15K) [text/xml]
  Saving to: ‘rss.NewIssueAndArticles.xml’

  100%[======================================>] 15,362      94.1KB/s   in 0.2s

  2013-03-20 05:22:08 (94.1 KB/s) - ‘rss.NewIssueAndArticles.xml’ saved [15362/15362]

which contained the poorly split summary:

  <item>
    <title>Synergistic Insertion of Antimicrobial Magainin-Family Peptides in Membranes Depends on the Lipid Spontaneous Curvature</title>
    <link>http://www.cell.com/biophysj/abstract/S0006-3495(13)00153-7</link>
    <description>Erik Strandberg, Jonathan Zerweck, Parvesh Wadhwani, Anne S. Ulrich. PGLa and magainin 2 (MAG2) are amphiphilic antimicrobial peptides from frog skin with known synergistic activity. The orientation of the two helices in membranes was studied using solid-state &lt;sup....</description>
    <pubDate>Tue, 19 Mar 2013 00:00:00 GMT</pubDate>
    <guid>http://www.cell.com/biophysj/abstract/S0006-3495(13)00153-7</guid>
    <dc:date>2013-03-19T00:00:00Z</dc:date>
  </item>

The '<sup....' in the description broke the parser.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoerror: Fix super calls for SMTPAuthenticationError, etc.
W. Trevor King [Tue, 19 Mar 2013 23:58:09 +0000 (19:58 -0400)]
error: Fix super calls for SMTPAuthenticationError, etc.

Fix copy/paste errors where super() calls used the wrong class name
(it should match the class in which the method is defined) for:

* SMTPAuthenticationError.__init__
* ProcessingError.__init__
* OPMLReadError.__init__

Reported-by: Matt Bordignon <matthew@bordignons.net>
Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoMerge remote-tracking branch 'alienacorn/master'
W. Trevor King [Tue, 19 Mar 2013 00:31:46 +0000 (20:31 -0400)]
Merge remote-tracking branch 'alienacorn/master'

* alienacorn/master:
  email: Change stray SMTP_SERVER to server

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoemail: Change stray SMTP_SERVER to server
George Saunders [Mon, 18 Mar 2013 23:20:09 +0000 (23:20 +0000)]
email: Change stray SMTP_SERVER to server

This fixes the error below that occurred upon sending a message by SMTP.

NameError: global name 'SMTP_SERVER' is not defined.

Signed-off-by: George Saunders <georgesaunders@gmail.com>
11 years agoemail: Remove explicit ehlo() call
W. Trevor King [Mon, 18 Mar 2013 10:47:46 +0000 (06:47 -0400)]
email: Remove explicit ehlo() call

It won't work before we've connected to the server:

  Traceback (most recent call last):
    ...
    File ".\rss2email\email.py", line 145, in smtp_send
      smtp.ehlo()
    ...
  smtplib.SMTPServerDisconnected: please run connect() first

That makes sense ;).  If we want to call ehlo(), we should certainly
do it after the .connect() call succeeds.  Looking at the docs [1], I
don't think we need to call it at all:

  Unless you wish to use has_extn() before sending mail, it should not
  be necessary to call this method explicitly. It will be implicitly
  called by sendmail() when necessary.

We don't use has_extn(), and the EHLO should happen implicitly in
starttls [2]:

  If there has been no previous EHLO or HELO command this session,
  this method tries ESMTP EHLO first.

and send_message [3]:

  This is a convenience method for calling sendmail()...

via sendmail [4]:

  If there has been no previous EHLO or HELO command this session,
  this method tries ESMTP EHLO first.

[1]: http://docs.python.org/3/library/smtplib.html#smtplib.SMTP.ehlo
[2]: http://docs.python.org/3/library/smtplib.html#smtplib.SMTP.starttls
[3]: http://docs.python.org/3/library/smtplib.html#smtplib.SMTP.send_message
[4]: http://docs.python.org/3/library/smtplib.html#smtplib.SMTP.sendmail

Reported-by: Matt Bordignon <matthew@bordignons.net>
Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agofeed: Streamline rel-via title extraction in _process_entry_content()
W. Trevor King [Fri, 15 Mar 2013 11:44:02 +0000 (07:44 -0400)]
feed: Streamline rel-via title extraction in _process_entry_content()

No functional change here, this just tightens up the Python code for
clarity.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agofeed: Drop Google Reader rel-via manipulation
W. Trevor King [Fri, 15 Mar 2013 11:37:25 +0000 (07:37 -0400)]
feed: Drop Google Reader rel-via manipulation

Google Reader will be retired on 2013-07-01, so we should be able to
drop the special rel-via handling it got starting with eb04d97 (Bump
to version 2.67, 2010-09-21).

[1]: http://googlereader.blogspot.com/2013/03/powering-down-google-reader.html

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoerror: Fix inheritance typos for HTTPError (ProcessingError -> FeedError)
W. Trevor King [Fri, 15 Mar 2013 10:43:29 +0000 (06:43 -0400)]
error: Fix inheritance typos for HTTPError (ProcessingError -> FeedError)

Avoid:

     Traceback (most recent call last):
       ...
       File ".../rss2email/feed.py", line 338, in _check_for_errors
         raise _error.HTTPError(status=status, feed=self)
       File ".../rss2email/error.py", line 166, in __init__
         super(FeedError, self).__init__(feed=feed, message=message)
     TypeError: __init__() got an unexpected keyword argument 'feed'

HTTPErrors occur when we're fetching a feed, so we don't have the
parsed feed instance required by ProcessingErrors.  We should use the
more general FeedError instead, and use the HTTPError class name in
the super() call.

Both of changes come from sloppy copy-paste errors while stubbing out
lots of similar error classes during the huge 066602ef (rss2email:
split massive package into modules, 2012-11-13).

Reported-by: Matěj Cepl <mcepl@redhat.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoBump to version 3.2 v3.2
W. Trevor King [Wed, 13 Mar 2013 13:50:37 +0000 (09:50 -0400)]
Bump to version 3.2

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoCHANGELOG: Update with summaries of recent changes
W. Trevor King [Wed, 13 Mar 2013 13:46:38 +0000 (09:46 -0400)]
CHANGELOG: Update with summaries of recent changes

This documents the following changes:
f01eac2 (email: Make path to sendmail configurable, 2013-02-15)
1c97270 (email: Attempt .as_string() if BytesGenerator.flatten()
  fails, 2013-02-16)
e08e198 (email: Decode headers when checking .as_string() flatten
  fallback, 2013-02-17)
3adef87 (config: Use extended interpolation in Config, 2013-02-21)

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoREADME: Link to the new openSUSE package
W. Trevor King [Sat, 2 Mar 2013 19:40:25 +0000 (14:40 -0500)]
README: Link to the new openSUSE package

On Sat, Mar 02, 2013 at 08:54:11AM -0800, Arun Persaud wrote:
> I managed to get rss2email 3.1 packaged for opensuse. It's available at
>
> http://download.opensuse.org/repositories/server:/mail/
>
> for 12.2, 12.3 and Factory in case you want to mention it on the project
> page. It doesn't build on older distros, mostly due to missing
> dependencies, but on 12.1 the build fails with
>
> ...
>
> The package is developed at:
>
> https://build.opensuse.org/package/show?package=rss2email&project=server%3Amail

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoconfig: Use extended interpolation in Config
W. Trevor King [Thu, 21 Feb 2013 11:11:19 +0000 (06:11 -0500)]
config: Use extended interpolation in Config

This avoids triggering accidental interpolation errors when your URL
contains percent signs (e.g. %2F).  Curly braces, on the other hand,
will never appear in an encoded URL.  From RFC 1738:

  Unsafe:
  ... Other characters are unsafe because gateways and other transport
  agents are known to sometimes modify such characters. These
  characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".

  All unsafe characters must always be encoded within a URL.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoemail: Decode headers when checking .as_string() flatten fallback
W. Trevor King [Sun, 17 Feb 2013 15:49:41 +0000 (10:49 -0500)]
email: Decode headers when checking .as_string() flatten fallback

A naive dict-comparison fails if any header fields are encoded
following RFC 2047.  For example,

  '=?iso-8859-1?q?this=20is=20some=20text?='

will not compare equal to an email.header.Header instance.  By
decoding everything to Unicode strings before comparing the header
fields, we can see if the underlying data matches.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoemail: Attempt .as_string() if BytesGenerator.flatten() fails
W. Trevor King [Sat, 16 Feb 2013 14:06:38 +0000 (09:06 -0500)]
email: Attempt .as_string() if BytesGenerator.flatten() fails

Before converting BytesGenerator in 8a907f9 (email: Fix _flatten()
implementation for non-ASCII bodies, 2013-01-23), we used to flatten
emails with message.as_string().  BytesGenerator should be the more
robust approach, but it is, unfortunately, broken with respect to
Unicode payloads [1,2,3,4].  This makes the use-8bit setting pretty
useless.

Until we find a clean fix for BytesGenerator, fall back on the earlier
.as_string() approach where possible.  We check the feasibility of the
fallback by performing a quasi-round-trip and comparing a message
recovered from the byte-encoded form with the original message.  If
the recovered version does not match the original message, we reraise
the BytesGenerator.flatten() error.  This fallback should work for
any charset who's mapping for ASCII characters is a no-op.

One benefit of this altered approach is that we no longer need to
encode the payload when we set it up in get_message().  This "Unicode
inside--encode on output" approach doesn't smell as much as the old
approach ;).

The new fallback will probably die screaming if you try and flatten a
multipart message, but we don't do that in rss2email.  Hopefully, the
upstream issues with the email library will be sorted out in the near
future...

[1]: http://thread.gmane.org/gmane.comp.python.general/725425
[2]: http://bugs.python.org/issue16324
[3]: http://bugs.python.org/issue12553
[4]: http://bugs.python.org/issue12552#msg140294

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoemail: Fix typo '\\n' -> '\n' in _flatten docstring.
W. Trevor King [Sat, 16 Feb 2013 13:22:10 +0000 (08:22 -0500)]
email: Fix typo '\\n' -> '\n' in _flatten docstring.

I seem to have forgotten that the docstring is raw (`r"""`) when I
wrote the original tests.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoemail: Make path to sendmail configurable
W. Trevor King [Fri, 15 Feb 2013 14:16:39 +0000 (09:16 -0500)]
email: Make path to sendmail configurable

For example, Azer Koçulu has an rss2email fork that uses msmtp [1].

[1]: https://github.com/azer/rss2email

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoCHANGELOG: Update after the release of 3.0 and 3.1
W. Trevor King [Fri, 15 Feb 2013 13:13:34 +0000 (08:13 -0500)]
CHANGELOG: Update after the release of 3.0 and 3.1

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoBump to version 3.1 v3.1
W. Trevor King [Thu, 14 Feb 2013 13:34:03 +0000 (08:34 -0500)]
Bump to version 3.1

Changes since 3.0:
* Import __url__, __author__, and __email__ in rss2email.error, which
  fixes bugs formatting a number of errors.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoerror: Import __*__ metadata (URL, author, email)
W. Trevor King [Thu, 14 Feb 2013 13:28:35 +0000 (08:28 -0500)]
error: Import __*__ metadata (URL, author, email)

These are used to format some of the log messages.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agosetup.py: Use __url__ and __author__ from rss2email/__init__.py
W. Trevor King [Thu, 14 Feb 2013 13:24:08 +0000 (08:24 -0500)]
setup.py: Use __url__ and __author__ from rss2email/__init__.py

Instead of duplicating those values locally.  This gives us one less
place to forget to update the next time we change the website or
maintainer.

Also, update __url__ to point to my GitHub repository and shift the
email address to rss2email.__email__.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoBump to version 3.0 v3.0
W. Trevor King [Wed, 13 Feb 2013 14:32:40 +0000 (09:32 -0500)]
Bump to version 3.0

Changes since 2.71:
* State storage split into a static configuration file (usually
  `~/.config/rss2email.cfg`) and a dynamic JSON data file (usually
  `~/.local/share/rss2email.json`).
* The static configuration file is parsed with Python's ConfigParser
  class, which allows for default settings that can be overridden on a
  global or per-feed basis.  You'll have to translate your old config
  to the new format by hand when you upgrade.
* Emailed messages now have Message-IDs.
* Feeds can be indexed by name as well as index (e.g.
  `r2e run my-feed`).
* Restructured as a package with submodules instead of a single
  module.  This makes dependencies between various portions of
  rss2email more explicit.
* Converted to Python >=3.2, for more consistent Unicode handling,
  exception chaining, and argparse (although argparse is also in 2.7).
* Packaged with setup.py and distutils, in case you want to install
  rss2email instead of running it from a Git checkout or unpacked
  tarball.
* Added a test suite (run with `./test/test.py`).
* Added a man page, based on the version in the Debian package.
* Require Signed-off-by lines in new commit messages, following the
  Linux and Git projects.
* Assorted cleanups and bug fixes.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agocommand: In run(), save feeds even after errors
W. Trevor King [Wed, 13 Feb 2013 14:12:05 +0000 (09:12 -0500)]
command: In run(), save feeds even after errors

It's annoying to have a few feeds processed successfully and then have
one feed with a configuration error take down the process without
saving.  With this commit, we always safe the feeds, regardless of any
error.  We also catch and log any RSS2EmailError, not just the
NoToEmailAddress and ProcessingErrors we caught earlier.

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoconfig: Fix 'significan' -> 'significant' typo in a comment
W. Trevor King [Thu, 24 Jan 2013 22:51:13 +0000 (17:51 -0500)]
config: Fix 'significan' -> 'significant' typo in a comment

Signed-off-by: W. Trevor King <wking@tremily.us>
11 years agoemail: Encode the body when we might use 8bit encoding
W. Trevor King [Thu, 24 Jan 2013 05:30:04 +0000 (00:30 -0500)]
email: Encode the body when we might use 8bit encoding

R. David Murray writes [1]:
> In 2.x that will work, and will give you the 8bit CTE at need, as
> long as you pass encoded text to MIMEText (as opposed to
> unicode...which it doesn't really handle correctly if I recall
> right).

See also, [2].

[1]: http://bugs.python.org/issue12552
     email.MIMEText overide BASE64 for utf8 charset
[2]: http://bugs.python.org/issue12553
     Add support for using a default CTE of '8bit' to MIMEText

Signed-off-by: W. Trevor King <wking@tremily.us>