Proposal: Improve Hex Escape Sequence, Remove Unicode Escape Sequence
| Sequence | Name | {N} |
|---|---|---|
\x{N} |
hexadecimal value | 32 digits >= 1 digits |
do without \u{N} |
The hex escape should allow underscore visual separators, like their int literal counterparts. Unicode codepoints don't need a dedicated escape sequence - mistyping an invalid codepoint is like mistyping a valid one. On the contrary, this is a shift from UTF-8 centrism, as far as escape sequences go. Sound 8-bit clean collation and identifiers in ZON and elsewhere, like UUIDs and codepoints. More symmetry with int literals and templates. Up from J8.
If this plus solutions #17385 and #14534 are accepted, we can achieve this representation in ZON:
.{
.@"\{ ff\ 0\ 0}" = "red",
.@"\{ 0\ ff\ 0}" = "green",
.@"\{ 0\ 0\ ff}" = "blue"
}
Similarly, the following: https://github.com/ziglang/zig/blob/153ba46a5b20f178d48ef2f09e0e638a3749af0e/lib/std/zig/tokenizer.zig#L1477 Becomes:
try testTokenize("//\x{f4\ 8f\ bf\ bf}", &.{});
or alternatively:
try testTokenize("//\x{10FF_FF}", &.{});
While this proposal, motivated by use cases in declarative ZON, erstwhile covered also binary/octal/decimal escapes, that's likely excessive for scenarios where plaintext ZON is exchanged, and there isn't much demand for that in regular strings. A sufficient solution may be editor plugins to overlay other radices on hex escapes, where needed.
I'm leaving this issue open however, since I think \u{N} should be consolidated into \x{N}. Ideally, this would work best with above mentioned delimiters and visual underscore separators, and should be capped at 32 hex digits, which corresponds to 128-bit UUIDs. Issues updated to reflect changes in proposal.