wit-bindgen icon indicating copy to clipboard operation
wit-bindgen copied to clipboard

Open questions for Unicode identifiers

Open sunfishcode opened this issue 4 years ago • 0 comments

#119 requires identifiers to be lower-case stream-safe NFC kebab-case where each part delimited by '-'s starts with a XID_Start scalar value with a zero canonical combining class.

Concerns which are not addressed yet include:

  • Whole-script confusables (eg. U+61 vs. U+430)

  • Mixed-script confusables

  • Width-sensitivity (eg. U+61 (a) vs U+ff41 ())

  • Should scripts no longer in active use, such as Linear B, be disallowed?

  • Should we restrict identifier parts from starting with 'Grapheme_Extend = Yes', such as U+1885?

  • The idea is to propose these rules for interface-types itself, but: do we really want component instantiation to do NFC validation and potentially other complex Unicode tests? This is about implementation simplicity, instantiation efficiency, and Unicode version sensitivity.

  • Should wit-bindgen's parser automatically normalize to NFC, rather than simply erroring on identifiers that aren't normalized?

sunfishcode avatar Dec 21 '21 20:12 sunfishcode