java-html-sanitizer icon indicating copy to clipboard operation
java-html-sanitizer copied to clipboard

Question: how to replace tag with another tag with inner text?

Open IvanPizhenko opened this issue 4 years ago • 9 comments

I want to completely replace tag if I encounter certain conditions. I have found I have found example in the HtmlPolicyBuilder Javadoc (https://www.javadoc.io/doc/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20180219.1/org/owasp/html/HtmlPolicyBuilder.html)

new HtmlPolicyBuilder()
   .allowElement(
     new ElementPolicy() {
       public String apply(String elementName, List<String> attributes){
         attributes.add("class");
         attributes.add("header-" + elementName);
         return "div";
       }
     },
     "h1", "h2", "h3", "h4", "h5", "h6")
   .build(outputChannel)

which shows how to use custom ElementPolicy. In theory, I could change list of attributes and return new tag name, similar to what is shown in the example. But in my case that’s not enough. I need to place a text inside the attribute. So for example, if I had:

<img src="https://example.com/some_image.jpg"></img>

I want to replace this with

<a href="https://example.com/some_image.jpg">some_image</a>

I was reading Javadoc, trying to figure out how to do it, and searching for examples on the Internet, but so far I could not find a way to place inner text. Is there a way to do it?

Same posted on SO

IvanPizhenko avatar Jun 14 '21 19:06 IvanPizhenko

Hmm. Good question. I don't think there's a way right now. Perhaps there needs to be a way for an element policy to receive an output token stream for content.

mikesamuel avatar Jun 16 '21 16:06 mikesamuel

You could replace it with a preprocessor, I am currently doing it like that.

            .withPreprocessor(r -> new HtmlStreamEventReceiverWrapper(r) {
                @Override
                public void openTag(@NotNull String elementName, @NotNull List<String> attrs) {
                    if ("a".equals(elementName)) {
                        super.openTag("span", attrs);
                    } else {
                        super.openTag(elementName, attrs);
                    }
                }
            })

myin142 avatar Jun 23 '21 11:06 myin142

@myin142 Thank you for your answer, but the problem is not to just replace tag, but to also place inner text inside the replaced tag. Meanwhile I have ended up with using Jsoup for that.

IvanPizhenko avatar Jun 23 '21 12:06 IvanPizhenko

@IvanPizhenko I was actually proposing a way to affect the innerHTML of the tag, by providing access to a scoped HtmlStreamEventReceiver.

That would allow specifying innerText/textContent by just calling .text which takes a string of text/plain.

But one could construct more complicated internals by using the open and close tag methods.

mikesamuel avatar Jun 23 '21 16:06 mikesamuel

@mikesamuel Is that implemented? If yes, can you please show a short working code example (like @myin142 provided above)?

IvanPizhenko avatar Jun 23 '21 22:06 IvanPizhenko

@IvanPizhenko No. I was thinking that I could add something that would let you do that.

The limitation that this library has, compared to JSoup, is that it operates as a streaming filter left to right.

That means it has a better memory footprint, and is less prone to denial of service, but it does mean that you can't look at a node's content when deciding what to do with it.

So I can offer some simple options to prepend/append/replace the content with something specified by a policy, but I cannot allow arbitrary rearrangement.

mikesamuel avatar Jun 24 '21 14:06 mikesamuel

@mikesamuel well, let's try your idea. Please do the changes you are talking about and provide a code example how to use them.

IvanPizhenko avatar Jun 25 '21 19:06 IvanPizhenko

Implement a processor as below, the inner text can be replaced.


    private static final PolicyFactory VALID_TAGS_POLICY = new HtmlPolicyBuilder()
            .withPreprocessor((HtmlStreamEventReceiver r) -> new HtmlStreamEventReceiverWrapper(r) {
                String newText = "";

                @Override
                public void closeTag(String elementName) {
                    // If this element is disallowed, clear it's content
                    if (!VALID_TAGS_SET.contains(elementName)) {
                        newText = "";
                    }
                    r.text(newText);
                    r.closeTag(elementName);
                }

                @Override
                public void text(String text) {
                    newText = text;
                }
            })
            .allowElements(VALID_TAGS)
            .toFactory();

WuXian-Allison avatar Nov 25 '22 08:11 WuXian-Allison

@WuXian-Allison Thank you! This is interesting idea!

IvanPizhenko avatar Nov 26 '22 17:11 IvanPizhenko