this should break urls that attempt to include a protocol, or port (these are absolute URLs and should have a whitelisted protocol for use)
but URLs that are relative, or relative from the site root should be preserved (though characters non essential for the URL structure may be urlencoded)
this approach has significant advantages over attempting to locate something like `javascript:alert(1)` or `javascript:alert(1)` (which are both valid) because browsers have been known to ignore ridiculous characters when encountered (meaning something like `jav\ta\0\0script:alert(1)` would be xss :( ). Instead of trying to chase down a way to interpret a URL to decide whether there is a protocol, this approach ensures that two essential characters needed to achieve a colon are encoded `:` (obviously) and `;` (from `:`). If these characters appear in a relative URL then they are equivalent to their URL encoded form and so this change will be non breaking for that case.
* add gif and jpg as allowed data images
* ensure that user controlled content fall only in the "data section" of the data URI (and does not intersect content-type definition in any way (best to be safe than sorry ;-)))
"data section" as defined in: https://tools.ietf.org/html/rfc2397#section-3
protect all attributes and content from xss via element method
filter special attributes (a href, img src)
expand url whitelist slightly to permit data images and mailto links
Okay, so maybe I should have looked 20 lines or so above where I made the edit in the element function – looks like it already supports adding attributes ;p
Have amended the change to blocklist to use the already existing functionality, and have reverted the change that I made to the element function.
Looks like I might need to return the pattern which was used previously
Reverting last change as build still failed
This build will still fail, but I'm hoping it will only fair where the list start value has been inserted