Martin Hofmann comments

Results 14 comments of


                                            Martin Hofmann

Consider writing scanners by hand

Would [`flex`][flex] provide a better trade-off between convenience and amount of generated source code? (IIRC `flex` uses tables to represent DFAs, not thousands of `if () goto` lines.) [flex]: https://github.com/westes/flex...

Consider writing scanners by hand

While I find the ~30000 lines (~400 KB) sized `scanners.c` a bit hefty, the tools do handle it easily. I once looked into a linker map file and found that...

Use XML for spec examples

Specifying the "result" of parsing and interpreting a _CommonMark_ input text **not** in the form of an output HTML text is certainly a good idea. The specification should instead describe...

Characters should be Unicode scalar values not Unicode code points.

Is "Unicode scalar value" the same as "code point of a Unicode _character_" then? It is my understanding that hi and lo surrogate code points in the BMP are _not_...

Characters should be Unicode scalar values not Unicode code points.

I don't understand this sentence: > No Unicode scalar values are exactly the Unicode code points minus the surrogate characters, the latters being a hack to be able to encode...

Characters should be Unicode scalar values not Unicode code points.

> Concretely what most programmers are interested in when they are dealing with text interchange are scalar values, that's what they have to process, decode and encode to UTF-X formats....

Characters should be Unicode scalar values not Unicode code points.

> This doesn't happen if you have a proper UTF-X decoder API. Again you will never get surrogates code points out of an UTF-X decoding process. I never doubted that....

Characters should be Unicode scalar values not Unicode code points.

> just note that "the legal characters of Unicode" is not a concept you can find formally defined in the Unicode Standard and would thus make the common mark spec...

Characters should be Unicode scalar values not Unicode code points.

I'm still waiting for your explanation of what exactly is "broken", "undefined", "ambiguous", "without precise meaning" in the given definition for _Char_, which obviously _is_ what "legal Unicode characters" is...

Entities

Looks good so far! > Here's one tricky issue that came up. Ideally, one would leave entities alone in link titles, rather than converting them to characters, at least if...