cppfront icon indicating copy to clipboard operation
cppfront copied to clipboard

[BUG] Error parsing UTF-8 character literal that is not a hex character

Open bluetarpmedia opened this issue 1 year ago • 3 comments

Describe the bug cppfront produces an error when parsing a UTF-8 character literal (u8) which is not a hex character.

To Reproduce Run cppfront on this code:

main: () -> int = {

    a:= u8'a';  // ok
    b:= u8'b';  // ok
    c:= u8'c';  // ok
    d:= u8'd';  // ok
    e:= u8'e';  // ok
    f:= u8'f';  // ok
    g:= u8'g';  // error: line ended before character literal was terminated

    return 0;
}

Repro

bluetarpmedia avatar Jun 24 '24 03:06 bluetarpmedia

Pardon my ignorance, but isn't u8 an unsigned 8 bit integer, not a utf-8 character literal?

sookach avatar Jun 25 '24 18:06 sookach

Yeah, Cpp2 has the type u8 (which lowers to cpp2::u8) but C++17 introduced the UTF-8 character literal so you can write u8'a'.

https://en.cppreference.com/w/cpp/language/character_literal

From my reading of the lexer, Cpp2 does support it: https://github.com/hsutter/cppfront/blob/a76e23b74f91ccec68336ddd5f84edb5b5216a7e/source/lex.h#L1190

bluetarpmedia avatar Jun 27 '24 05:06 bluetarpmedia

Thanks! I'll take a look.

I hadn't noticed that the literal prefix and the unsigned type alias used the same name. Interesting!

hsutter avatar Jun 27 '24 15:06 hsutter

Thanks, I found the problem. It turns out I need to also check the encoding prefixes when doing the load.h brace-match to find the end of the Cpp2 definition, which also needs to be aware of literals (in case braces we should ignore are hiding inside a literal). Fixing...

hsutter avatar Jul 12 '24 16:07 hsutter