From 26bf69d17aff4c74dd6c368712091d8c1fc977a6 Mon Sep 17 00:00:00 2001 From: "http://smcv.pseudorandom.co.uk/" Date: Tue, 6 Apr 2010 00:49:38 +0000 Subject: [PATCH] as seen on IRC --- ...ot_all_meta_fields_are_stored_escaped.mdwn | 32 +++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100644 doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn diff --git a/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn b/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn new file mode 100644 index 000000000..d79318dd8 --- /dev/null +++ b/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn @@ -0,0 +1,32 @@ +[[!template id=gitbranch branch=smcv/unescaped-meta author="[[Simon_McVittie|smcv]]"]] +(Warning: this branch has not been tested thoroughly.) + +While discussing the [[plugins/meta]] plugin on IRC, Joey pointed out that +it stores most meta fields unescaped, but 'title', 'guid' and 'description' +are special-cased and stored escaped (with numeric XML/HTML entities). This +is to avoid emitting markup in the of a HTML page, or in an RSS/Atom +feed, neither of which are subject to the [[plugins/htmlscrubber]]. + +However, having the meta fields "partially escaped" like this is somewhat +error-prone. Joey suggested that perhaps everything should be stored +unescaped, and the escaping should be done on output; this branch +implements that. + +Points of extra subtlety: + +* The title given to the [[plugins/search]] plugin was previously HTML; + now it's plain text, potentially containing markup characters. I suspect + that that's what Xapian wants anyway (which is why I didn't change it), + but I could be wrong... + +* Page descriptions in the HTML `<head>` were previously double-escaped: + the description was stored escaped with numeric entities, then that was + output with a second layer of escaping! In this branch, I just emit + the page description escaped once, as was presumably the intention. + +* It's safe to apply this change to a wiki and neglect to rebuild it + (assuming I implemented it correctly!), but until the wiki is rebuilt, + titles, descriptions and GUIDs for unchanged pages will appear + double-escaped on any page that inlines them in `quick=yes` mode, and + is rebuilt for some other reason. The failure mode is too much escaping + rather than too little, so it shouldn't be a security problem. -- 2.26.2