Details about wide-character support
Could you please detail what parts of multi-byte characters are supported by CompCert? Or, inversely, if there are specific multi-byte-related features which are not supported?
In particular, some compilers accept this code (e.g. GCC, Clang), while CompCert emits a warning (escape sequence is out of range (code 0x3B1)) and then crashes:
#include <stdio.h>
int main() {
printf("\u03B1-conversion\n");
}
Also, whenever a wide character literal is used in a function such as wprintf, a warning similar to argument #1 of function call has type int * instead of the expected type wchar_t const * is emitted. This seems to impact the result of this function, which differs from GCC/Clang in examples such as:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main() {
setlocale(LC_ALL, "en_US.UTF-8");
wchar_t c = L'\x3b1';
wprintf(L"α %lc\n", c);
}
CompCert prints some nonsensical values such as α ? instead of α α.
I'm afraid CompCert's support for multi-byte characters in string and char literals is nonexistent at the moment, while support for wide strings and wide chars is flaky... None of that stuff was exercised by our current users.
There are two issues that can easily be fixed: the crash and the wrong initialization of c by L'\x3b1'.
I don't know yet how to go about UTF8 chars in string and wide string literals, though.
Fixed the two easy issues in commit [master d3959ce].
Chapter 5 of the CompCert manual does document that multi-byte characters are not supported.
I leave this support as a feature wish.
Excellent, thank you very much.
I do not have much use for wide strings, so it's probably not worth spending more time on it. The fixed crash is already more than enough for my needs.