[BUG] Error parsing UTF-8 character literal that is not a hex character
Describe the bug
cppfront produces an error when parsing a UTF-8 character literal (u8) which is not a hex character.
To Reproduce Run cppfront on this code:
main: () -> int = {
a:= u8'a'; // ok
b:= u8'b'; // ok
c:= u8'c'; // ok
d:= u8'd'; // ok
e:= u8'e'; // ok
f:= u8'f'; // ok
g:= u8'g'; // error: line ended before character literal was terminated
return 0;
}
Pardon my ignorance, but isn't u8 an unsigned 8 bit integer, not a utf-8 character literal?
Yeah, Cpp2 has the type u8 (which lowers to cpp2::u8) but C++17 introduced the UTF-8 character literal so you can write u8'a'.
https://en.cppreference.com/w/cpp/language/character_literal
From my reading of the lexer, Cpp2 does support it: https://github.com/hsutter/cppfront/blob/a76e23b74f91ccec68336ddd5f84edb5b5216a7e/source/lex.h#L1190
Thanks! I'll take a look.
I hadn't noticed that the literal prefix and the unsigned type alias used the same name. Interesting!
Thanks, I found the problem. It turns out I need to also check the encoding prefixes when doing the load.h brace-match to find the end of the Cpp2 definition, which also needs to be aware of literals (in case braces we should ignore are hiding inside a literal). Fixing...