Inter-tag whitespace between lines gets truncated
I ran into this issue while trying to use highlight.js on code examples. Obviously, whitespace is very important in code so the current behavior just won't do.
Example
<strong>there should be</strong>
<strong>a space</strong>
When you put the above into your parser (e. g. on the demo page), the output lacks a space between be and a. This doesn't seem to be a htmlparser2 issue, because in its AST explorer demo, it does show a text node between the two elements [link here].
Thanks for the report this does seem like undesired behaviour.
The reason it strips whitespace that contain line breaks is because otherwise it generates an extra react component for every line break in the HTML you want to convert which can add up for large datasets. It's also how JSX behaves when you have markup like the following:
render() {
return (
<div>
<strong>there is no</strong>
<strong>space</strong>
</div>
);
}
Maybe the extra elements don't matter and it was a premature optimisation? I'll have to do some benchmarks and see if adding them back in makes any difference.
Thank you.
You're right, it might be how JSX behaves, but I figure this library has HTML in its name. It would be fitting for this to be an HTML parser, not a JSX one.
My actual use case is parsing snippets like this:
<pre>
<span class="function"><span class="keyword">def</span> <span class="name">func</span>(a):</span>
<span class="statement">print</span> a
</pre>
and rendering them into highlighted python code like this:
def func(a):
print a
Quick and dirty workaround:
preprocessNodes: nodes => nodes.map(node => {
//FIXME: https://github.com/wrakky/react-html-parser/issues/39
if (isEmptyTextNode(node)) {
node.type = "protected";
}
return node;
}),
transform: (node) => {
//FIXME: https://github.com/wrakky/react-html-parser/issues/39
if (node.type == "protected") {
node.type = "text";
}
return undefined;
}
@khitrin thanks for that tip. Awesome.
By the way if development resumes at some point, I might suggest that "preserve whitespace" is made as a built-in option. In my case I really needed it for dealing with preformatted code tags.