Open questions for Unicode identifiers
#119 requires identifiers to be lower-case stream-safe NFC kebab-case where each part delimited by '-'s starts with a XID_Start scalar value with a zero canonical combining class.
Concerns which are not addressed yet include:
-
Whole-script confusables (eg. U+61 vs. U+430)
-
Width-sensitivity (eg. U+61 (
a) vs U+ff41 (a)) -
Should scripts no longer in active use, such as Linear B, be disallowed?
-
Should we restrict identifier parts from starting with 'Grapheme_Extend = Yes', such as U+1885?
-
The idea is to propose these rules for interface-types itself, but: do we really want component instantiation to do NFC validation and potentially other complex Unicode tests? This is about implementation simplicity, instantiation efficiency, and Unicode version sensitivity.
-
Should wit-bindgen's parser automatically normalize to NFC, rather than simply erroring on identifiers that aren't normalized?