cmark icon indicating copy to clipboard operation
cmark copied to clipboard

[cmark --smart] paren-quote-markup combination

Open giucal opened this issue 6 years ago • 2 comments

A plausible occurrence in a document is ("text"), which cmark --smart correctly turns into

(“text”)

However, if we emphasize the text,

("*text*")

we get

(”<em>text</em>”)

Note the incorrect right quote after the opening paren.

The same goes for the combination of a paren, a quote and other markup, such as strong emphasis and references:

("**text**") --> (”<strong>text</strong>”)
("[text]")   --> (”<a href=...>text</a>”)

giucal avatar Nov 05 '19 10:11 giucal

Adding some diagnostics

 % ./build/src/cmark --smart
("*text*")
char = ", can_open = 0, can_close = 1
char = *, can_open = 1, can_close = 0
char = *, can_open = 0, can_close = 1
char = ", can_open = 0, can_close = 1
<p>(”<em>text</em>”)</p>

So the problem is that the opening " character is marked as can_close but not can_open. Further investigation reveals

char = ", left_flanking = 1, right_flanking = 1

Now let's look at the logic at src/inlines.c l. 444:

    } else if (c == '\'' || c == '"') {
      *can_open = left_flanking && !right_flanking &&
                   before_char != ']' && before_char != ')';
      *can_close = right_flanking;

So for a quote character, to be marked as can_open you have to be left flanking and not right flanking. In this case the " is both left and right flanking, so it isn't marked as can_open. It's both left and right flanking because it's between two punctuation characters.

We may need to tweak the logic here and add more test cases.

jgm avatar Nov 05 '19 17:11 jgm

This change fixes the issue:

diff --git a/src/inlines.c b/src/inlines.c
index e6b491f..fb7d2e4 100644
--- a/src/inlines.c
+++ b/src/inlines.c
@@ -439,8 +439,9 @@ static int scan_delims(subject *subj, unsigned char c, bool *can_open,
     *can_close = right_flanking &&
                  (!left_flanking || cmark_utf8proc_is_punctuation(after_char));
   } else if (c == '\'' || c == '"') {
-    *can_open = left_flanking && !right_flanking &&
-                before_char != ']' && before_char != ')';
+    *can_open = left_flanking &&
+         (!right_flanking || before_char == '(' || before_char == '[') &&
+         before_char != ']' && before_char != ')';
     *can_close = right_flanking;
   } else {
     *can_open = left_flanking;

I'm not closing this yet, because we need to

  • [ ] improve test/smart_punct.txt with much more accurate documentation of the rules, and further test cases
  • [ ] commit the code change above
  • [ ] add issues to commonmark-hs and commonmark.js so that comparable changes can be made there

jgm avatar Nov 05 '19 17:11 jgm