Question: how to replace tag with another tag with inner text?
I want to completely replace tag if I encounter certain conditions. I have found I have found example in the HtmlPolicyBuilder Javadoc (https://www.javadoc.io/doc/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20180219.1/org/owasp/html/HtmlPolicyBuilder.html)
new HtmlPolicyBuilder()
.allowElement(
new ElementPolicy() {
public String apply(String elementName, List<String> attributes){
attributes.add("class");
attributes.add("header-" + elementName);
return "div";
}
},
"h1", "h2", "h3", "h4", "h5", "h6")
.build(outputChannel)
which shows how to use custom ElementPolicy. In theory, I could change list of attributes and return new tag name, similar to what is shown in the example. But in my case that’s not enough. I need to place a text inside the attribute. So for example, if I had:
<img src="https://example.com/some_image.jpg"></img>
I want to replace this with
<a href="https://example.com/some_image.jpg">some_image</a>
I was reading Javadoc, trying to figure out how to do it, and searching for examples on the Internet, but so far I could not find a way to place inner text. Is there a way to do it?
Same posted on SO
Hmm. Good question. I don't think there's a way right now. Perhaps there needs to be a way for an element policy to receive an output token stream for content.
You could replace it with a preprocessor, I am currently doing it like that.
.withPreprocessor(r -> new HtmlStreamEventReceiverWrapper(r) {
@Override
public void openTag(@NotNull String elementName, @NotNull List<String> attrs) {
if ("a".equals(elementName)) {
super.openTag("span", attrs);
} else {
super.openTag(elementName, attrs);
}
}
})
@myin142 Thank you for your answer, but the problem is not to just replace tag, but to also place inner text inside the replaced tag. Meanwhile I have ended up with using Jsoup for that.
@IvanPizhenko I was actually proposing a way to affect the innerHTML of the tag, by providing access to a scoped HtmlStreamEventReceiver.
That would allow specifying innerText/textContent by just calling .text which takes a string of text/plain.
But one could construct more complicated internals by using the open and close tag methods.
@mikesamuel Is that implemented? If yes, can you please show a short working code example (like @myin142 provided above)?
@IvanPizhenko No. I was thinking that I could add something that would let you do that.
The limitation that this library has, compared to JSoup, is that it operates as a streaming filter left to right.
That means it has a better memory footprint, and is less prone to denial of service, but it does mean that you can't look at a node's content when deciding what to do with it.
So I can offer some simple options to prepend/append/replace the content with something specified by a policy, but I cannot allow arbitrary rearrangement.
@mikesamuel well, let's try your idea. Please do the changes you are talking about and provide a code example how to use them.
Implement a processor as below, the inner text can be replaced.
private static final PolicyFactory VALID_TAGS_POLICY = new HtmlPolicyBuilder()
.withPreprocessor((HtmlStreamEventReceiver r) -> new HtmlStreamEventReceiverWrapper(r) {
String newText = "";
@Override
public void closeTag(String elementName) {
// If this element is disallowed, clear it's content
if (!VALID_TAGS_SET.contains(elementName)) {
newText = "";
}
r.text(newText);
r.closeTag(elementName);
}
@Override
public void text(String text) {
newText = text;
}
})
.allowElements(VALID_TAGS)
.toFactory();
@WuXian-Allison Thank you! This is interesting idea!