From f543303bf0042158ca5bc119681019ead4140662 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 29 May 2008 02:51:40 -0400 Subject: [PATCH] web commit by http://liw.fi/: uuml html entity in feeds confuses ikiwiki when aggregating --- ...kes_ikiwiki_not_un-escape_HTML_at_all.mdwn | 35 +++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 doc/bugs/__38__uuml__59___in_markup_makes_ikiwiki_not_un-escape_HTML_at_all.mdwn diff --git a/doc/bugs/__38__uuml__59___in_markup_makes_ikiwiki_not_un-escape_HTML_at_all.mdwn b/doc/bugs/__38__uuml__59___in_markup_makes_ikiwiki_not_un-escape_HTML_at_all.mdwn new file mode 100644 index 000000000..7e9bf84e2 --- /dev/null +++ b/doc/bugs/__38__uuml__59___in_markup_makes_ikiwiki_not_un-escape_HTML_at_all.mdwn @@ -0,0 +1,35 @@ +I'm experimenting with using Ikiwiki as a feed aggregator. + +The Planet Ubuntu RSS 2.0 feed () as of today +has someone whose name contains the character u-with-umlaut. In HTML 4.0, this is +specified as the character entity uuml. Ikiwiki 2.47 running on Debian etch does +not seem to understand that entity, and decides not to un-escape any markup in +the feed. This makes the feed hard to read. + +The following is the test input: + + + + testfeed + http://example.com/ + en + example + + ü + http://example.com + http://example.com + foo + Tue, 27 May 2008 22:42:42 +0000 + + + + +When I feed this to ikiwiki, it complains: +"processed ok at 2008-05-29 09:44:14 (invalid UTF-8 stripped from feed) (feed entities escaped" + +Note also that the test input contains only pure ASCII, no UTF-8 at all. + +If I remove the ampersand in the title, ikiwiki has no problem. However, the entity is +valid HTML, so it would be good for ikiwiki to understand it. At the minimum, stripping +the offending entity but un-escaping the rest seems like a reasonable thing to do, +unless that has security implications. -- 2.26.2