Preserve information about leading whitespace in paragraphs
When using DocumentParser with no enabled block types, the Text nodes do not include whitespaces at the beginning of a line.
For example, when we use a parser with no enabled block types and the input is:
- text 1
- text 2
the expected result is a document containing two Text nodes:
- Text("- text 1")
- Text(" - text 2")
but the second Text is "- text 2" (without preceding whitespaces)
To better illustrate here is a sample test that fails:
public class ParserTest {
...
@Test
public void noBlockTypes() {
String given = "- text 1\n - text 2";
Parser parser = Parser.builder().enabledBlockTypes(Collections.<Class<? extends Block>>emptySet()).build();
Node document = parser.parse(given);
Node child = document.getFirstChild();
assertThat(child, instanceOf(Paragraph.class));
child = child.getFirstChild();
assertThat(child, instanceOf(Text.class));
assertEquals("- text 1", ((Text) child).getLiteral());
child = child.getNext();
assertThat(child, instanceOf(SoftLineBreak.class));
child = child.getNext();
assertThat(child, instanceOf(Text.class));
assertEquals(" - text 2", ((Text) child).getLiteral());
}
}
The reason for this is the paragraph parser. The spec says that leading whitespace is skipped: https://spec.commonmark.org/0.31.2/#example-222
Not sure how we would handle it. We can't add the leading whitespace to the literal of Text nodes as that would change rendering for existing code, but maybe we could add it as another attribute.
Note that you should be able to work around this limitation by checking the source spans of the text (see includeSourceSpans on Parser.Builder).
See also https://github.com/commonmark/commonmark-java/pull/290#issuecomment-1986613844