Converting emphasis with angled quotation marks
Hi,
I'm trying to convert a document which contains "«_word_»" string. As you can see on example below, the parser cannot recognize it as emphasis:
var html1 = Markdown.ToHtml("«_word_»"); // "<p>«_word_»</p>\n"
But "_«word»_" has been converted ok:
var html2 = Markdown.ToHtml("_«word»_"); // "<p><em>«word»</em></p>\n"
I'm using Markdig 0.30.2. Is it a bug? If yes, is there any workaround to avoid the issue? Thanks.
Oh, interesting... you might hit a specific case of the specs, as there is a split between the results of the different CommonMark parsers here
So the spec about emphasis is here and I would think that it is not a bug as per the rule:
A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a Unicode punctuation character, or (2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.
I haven't checked but it is high likely that the character « and » are Unicode punctuation character.
cc: @MihaZupan thoughts?
This is a bug, our CheckUnicodeCategory helper is not matching what CommonMark defines as Unicode Whitespace and Unicode punctuation.
Specifically, we are off in the 128-255 range (where « and » are) and with our Unicode space categories.
11 ('♂') Space should be False
133 ('?') Space should be False
161 ('¡') Punctuation should be True
167 ('§') Punctuation should be True
171 ('«') Punctuation should be True
182 ('¶') Punctuation should be True
183 ('·') Punctuation should be True
187 ('»') Punctuation should be True
191 ('¿') Punctuation should be True
8232 ('?') Space should be False
8233 ('?') Space should be False
IsWhitespace also isn't matching the spec rn.