react-html-parser icon indicating copy to clipboard operation
react-html-parser copied to clipboard

Inter-tag whitespace between lines gets truncated

Open kdojeteri opened this issue 7 years ago • 5 comments

I ran into this issue while trying to use highlight.js on code examples. Obviously, whitespace is very important in code so the current behavior just won't do.

Example

<strong>there should be</strong>
    <strong>a space</strong>

When you put the above into your parser (e. g. on the demo page), the output lacks a space between be and a. This doesn't seem to be a htmlparser2 issue, because in its AST explorer demo, it does show a text node between the two elements [link here].

kdojeteri avatar Jan 30 '18 16:01 kdojeteri

Thanks for the report this does seem like undesired behaviour.

The reason it strips whitespace that contain line breaks is because otherwise it generates an extra react component for every line break in the HTML you want to convert which can add up for large datasets. It's also how JSX behaves when you have markup like the following:

render() {
  return (
    <div>
      <strong>there is no</strong>
                <strong>space</strong>
    </div>
  );
}

Maybe the extra elements don't matter and it was a premature optimisation? I'll have to do some benchmarks and see if adding them back in makes any difference.

peternewnham avatar Jan 31 '18 16:01 peternewnham

Thank you.

You're right, it might be how JSX behaves, but I figure this library has HTML in its name. It would be fitting for this to be an HTML parser, not a JSX one.

My actual use case is parsing snippets like this:

<pre>
<span class="function"><span class="keyword">def</span> <span class="name">func</span>(a):</span>
    <span class="statement">print</span> a
</pre>

and rendering them into highlighted python code like this:

def func(a):
    print a

kdojeteri avatar Feb 02 '18 13:02 kdojeteri

Quick and dirty workaround:

  preprocessNodes: nodes => nodes.map(node => {
    //FIXME: https://github.com/wrakky/react-html-parser/issues/39
    if (isEmptyTextNode(node)) {
      node.type = "protected";
    }
    return node;
  }),
  transform: (node) => {
    //FIXME: https://github.com/wrakky/react-html-parser/issues/39
    if (node.type == "protected") {
      node.type = "text";
    }
    return undefined;
  }

khitrin avatar Dec 04 '18 13:12 khitrin

@khitrin thanks for that tip. Awesome.

voneiden avatar May 22 '20 23:05 voneiden

By the way if development resumes at some point, I might suggest that "preserve whitespace" is made as a built-in option. In my case I really needed it for dealing with preformatted code tags.

voneiden avatar May 22 '20 23:05 voneiden