cmark icon indicating copy to clipboard operation
cmark copied to clipboard

Bug in lazy blockquote continuation (or corresponding spec is wrong, whichever)

Open shyouhei opened this issue 8 years ago • 7 comments

Current spec says:

If a string of lines Ls constitute a block quote with contents Bs, then the result of deleting the initial [block quote marker] from one or more lines in which the next [non-whitespace character] after the [block quote marker] is [paragraph continuation text] is a block quote with Bs as its content.

http://spec.commonmark.org/0.27/#block-quotes

This spec is clear, and explicit. I see no ambiguity. Now consider this blockquote:

> This is _a_ paragraph continuation text
> 2. because the line starts with `2`, not `1`.

This blockquote is valid, and satisfies the "If a string of lines Ls constitute a block quote with contents Bs" condition. So according to the spec, we can delete the leading >. By doing so we get this:

> This is _a_ paragraph continuation text
2. because the line starts with `2`, not `1`.

This blockquote, according to the spec, must be identical to the former one. However cmark does not agree.

% cmark <<'EOS'
> This is _a_ paragraph continuation text
> 2. because the line starts with `2`, not `1`.
EOS
<blockquote>
<p>This is <em>a</em> paragraph continuation text
2. because the line starts with <code>2</code>, not <code>1</code>.</p>
</blockquote>
% cmark <<'EOS'
> This is _a_ paragraph continuation text
2. because the line starts with `2`, not `1`.
EOS
<blockquote>
<p>This is <em>a</em> paragraph continuation text</p>
</blockquote>
<ol start="2">
<li>because the line starts with <code>2</code>, not <code>1</code>.</li>
</ol>
%

Is it intentional?

shyouhei avatar Jun 15 '17 09:06 shyouhei

Yes. This does seem to be a bug. I suspected that this change would be problematic in some way.

See the related discussion here: https://talk.commonmark.org/t/blank-lines-before-lists-revisited/1990

jgm avatar Jun 18 '17 10:06 jgm

This is not so easy to fix. For parse_list_marker, if we define interrupts_paragraph as "last matched block is a PARAGRAPH block", then we fail on this case. If we define it as "current block is a PARAGRAPH block", then we fail on cases like

1. foo
2. bar

since the line beginning 2. can be interpreted as a lazy continuation of the paragraph in item 1..

This may be a deep problem that needs to be fixed by rethinking the spec, at least the decision in jgmCommonMark@0ff8022.

jgm avatar Jun 18 '17 10:06 jgm

OK, it seems I happened to poke something overlooked. It's OK for me to wait for a better spec. Thank you for the quick reply.

shyouhei avatar Jun 18 '17 14:06 shyouhei

I'm wondering whether this could be handled by modifying the spec as follows.

    [Paragraph continuation text](@) is text that
+   is
-   will be
    parsed as part of the content of a paragraph,
+   and would be parsed as part of the content of a paragraph
+   if the leading `>` were removed,
    but does not occur at the beginning of the paragraph.

jgm avatar Mar 20 '19 14:03 jgm

I'm wondering whether this could be handled by modifying the spec as follows.

    [Paragraph continuation text](@) is text that
+   is
-   will be
    parsed as part of the content of a paragraph,
+   and would be parsed as part of the content of a paragraph
+   if the leading `>` were removed,
    but does not occur at the beginning of the paragraph.

The term Paragraph continuation text is also referred from the section about list items. So the wording should cover any container block markers, not just >.

But that would open another problem. Because for lists, we naturally need that the 2nd list item can interrupt the 1st item:

1. first item
2. second item

So, we would probably have to redefine paragraph continuation lines differently for lists and for block quotes.

Do we want that?

mity avatar Mar 26 '19 19:03 mity

Maybe that is also key how to (re)define the continuation line if we decide to keep the current behavior. I.e. that it is more or less merging of two paragraphs, where the 2nd one (the continuation) is higher in the current block nesting hierarchy.

I.e. more formally perhaps something like this:

Paragraph continuation text is line of text that fulfills all these conditions:

  1. it would otherwise end the current container block (block quote or list item) because the lack of proper prefix (> marker or list item indentation), possibly even multiple ones if the container blocks are nested in each other;
  2. it is a line which would not start a new container block; and
  3. it is a line which, if it would follow a blank line, would start a new paragraph block.

(Yeah, someone with better English could rephrase it be better. But I hope you can get the idea.)

mity avatar Mar 26 '19 20:03 mity