command: Sluggify feed names on opmlimport
authorW. Trevor King <wking@tremily.us>
Sun, 13 Oct 2013 21:54:29 +0000 (14:54 -0700)
committerW. Trevor King <wking@tremily.us>
Mon, 14 Oct 2013 22:22:45 +0000 (15:22 -0700)
commit88cc8df510a55b6616dae274851110a7b162ee23
tree590e62c4ccd12287105d1b465fa70c53fed6f3af
parentdd7f516e09174d57bf986892af2c41dae8e232d6
command: Sluggify feed names on opmlimport

Gaëtan Harter writes [1]:
> Importing the following opml file fails with `invalid feed name
> 'Arch Linux: Recent news updates`
>
>   <?xml version="1.0" encoding="UTF-8"?>
>   <opml version="1.0">
>     <head>
>       <title>Google reader export</title>
>     </head>
>     <body>
>       <outline text="Arch Linux: Recent news updates"
>                title="Arch Linux: Recent news updates" type="rss"
>                xmlUrl="http://www.archlinux.org/feeds/news/"
>                htmlUrl="https://www.archlinux.org/news/" />
>     </body>
>   </opml>
>
> It fails because the `text` field is used directly as `name` for
> creating a Feed object.

ConfigParser can handle colons and accented characters in their
section names [2], but Feed._set_name checks names against
Feed._name_regexp which only allows ASCII letters, digits, periods,
underscores, and the hyphen-minus (U+002D).  Add an inverse
name_slug_regexp to opmlimport that replaces any runs of illegal
characters with a single hyphen-minus, to avoid crashing if the text
attribute contains anything illegal.

[1]: https://github.com/wking/rss2email/issues/24#issuecomment-26224593
[2]: http://docs.python.org/3/library/configparser.html#supported-ini-file-structure

Reported-by: Gaëtan Harter <hartergaetan@gmail.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
rss2email/command.py