Gaëtan Harter writes [1]:
> Importing the following opml file fails with `invalid feed name
> 'Arch Linux: Recent news updates`
>
> <?xml version="1.0" encoding="UTF-8"?>
> <opml version="1.0">
> <head>
> <title>Google reader export</title>
> </head>
> <body>
> <outline text="Arch Linux: Recent news updates"
> title="Arch Linux: Recent news updates" type="rss"
> xmlUrl="http://www.archlinux.org/feeds/news/"
> htmlUrl="https://www.archlinux.org/news/" />
> </body>
> </opml>
>
> It fails because the `text` field is used directly as `name` for
> creating a Feed object.
ConfigParser can handle colons and accented characters in their
section names [2], but Feed._set_name checks names against
Feed._name_regexp which only allows ASCII letters, digits, periods,
underscores, and the hyphen-minus (U+002D). Add an inverse
name_slug_regexp to opmlimport that replaces any runs of illegal
characters with a single hyphen-minus, to avoid crashing if the text
attribute contains anything illegal.
[1]: https://github.com/wking/rss2email/issues/24#issuecomment-
26224593
[2]: http://docs.python.org/3/library/configparser.html#supported-ini-file-structure
Reported-by: Gaëtan Harter <hartergaetan@gmail.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
"""rss2email commands
"""
+import re as _re
import sys as _sys
import xml.dom.minidom as _minidom
import xml.sax.saxutils as _saxutils
raise _error.OPMLReadError() from e
if args.file:
f.close()
+ name_slug_regexp = _re.compile('[^a-zA-Z0-9._-]+')
for feed in new_feeds:
if feed.hasAttribute('xmlUrl'):
url = _saxutils.unescape(feed.getAttribute('xmlUrl'))
if feed.hasAttribute('text'):
text = _saxutils.unescape(feed.getAttribute('text'))
if text != url:
- name = text
+ name = name_slug_regexp.sub('-', text)
feed = feeds.new_feed(name=name, url=url)
_LOG.info('add new feed {}'.format(feed))
feeds.save()