cli icon indicating copy to clipboard operation
cli copied to clipboard

Remove invalid elements from <head> when serializing

Open wwilsman opened this issue 5 years ago • 2 comments

It's fairly common for some advertising or tracking scripts to inject content into the head such as iframes or images. iframes are allowed in head elements if they themselves only contain meta content elements, but otherwise anything that is not a meta content element is not allowed in a head element.

When re-rendering, browsers insert an implicit </head> before invalid content which pushes any following elements, meta content or not, into the body. Usually resulting in a broken page.

We currently remove all iframes from the head since they do not influence the page visually. There is a very short list of allowed elements in headers. https://developer.mozilla.org/en-US/docs/Web/HTML/Element/head#See_also

Should we start by removing images and eventually expand that to include other common elements? Or should we iterate through head elements and remove any that are not explicitly allowed?

wwilsman avatar Jul 22 '20 16:07 wwilsman

@wwilsman I think we took care of this in CLI, right?

Robdel12 avatar Jul 15 '21 21:07 Robdel12

Not entirely. iFrames are removed from head elements but we don't yet prune invalid content

wwilsman avatar Jul 16 '21 15:07 wwilsman