From: http://smcv.pseudorandom.co.uk/ Date: Tue, 6 Apr 2010 00:49:38 +0000 (+0000) Subject: as seen on IRC X-Git-Tag: 3.20100427~229 X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=26bf69d17aff4c74dd6c368712091d8c1fc977a6;p=ikiwiki.git as seen on IRC --- diff --git a/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn b/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn new file mode 100644 index 000000000..d79318dd8 --- /dev/null +++ b/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn @@ -0,0 +1,32 @@ +[[!template id=gitbranch branch=smcv/unescaped-meta author="[[Simon_McVittie|smcv]]"]] +(Warning: this branch has not been tested thoroughly.) + +While discussing the [[plugins/meta]] plugin on IRC, Joey pointed out that +it stores most meta fields unescaped, but 'title', 'guid' and 'description' +are special-cased and stored escaped (with numeric XML/HTML entities). This +is to avoid emitting markup in the of a HTML page, or in an RSS/Atom +feed, neither of which are subject to the [[plugins/htmlscrubber]]. + +However, having the meta fields "partially escaped" like this is somewhat +error-prone. Joey suggested that perhaps everything should be stored +unescaped, and the escaping should be done on output; this branch +implements that. + +Points of extra subtlety: + +* The title given to the [[plugins/search]] plugin was previously HTML; + now it's plain text, potentially containing markup characters. I suspect + that that's what Xapian wants anyway (which is why I didn't change it), + but I could be wrong... + +* Page descriptions in the HTML `<head>` were previously double-escaped: + the description was stored escaped with numeric entities, then that was + output with a second layer of escaping! In this branch, I just emit + the page description escaped once, as was presumably the intention. + +* It's safe to apply this change to a wiki and neglect to rebuild it + (assuming I implemented it correctly!), but until the wiki is rebuilt, + titles, descriptions and GUIDs for unchanged pages will appear + double-escaped on any page that inlines them in `quick=yes` mode, and + is rebuilt for some other reason. The failure mode is too much escaping + rather than too little, so it shouldn't be a security problem.