rescript-compiler icon indicating copy to clipboard operation
rescript-compiler copied to clipboard

#"🚀" cast to string is not "🚀"

Open leoliu opened this issue 11 months ago • 9 comments

I wonder if this is normal but it has caught me off-guard a few times. The gist: "🚀" == (#"🚀" :> string) is false.

See https://rescript-lang.org/try?version=v12.0.0-alpha.8&module=esmodule&code=C4TwDgpgBMULxQNoGIBEheDcAF7qA+yQC6AUADYSwAeAXDPFGlqqeVCDcAvkQFIDOAdCQD2AcwAUGbPARiKUKgD4ovYACcAlgDsRASh08Bw8ahCppUMSHlKVG7XqA

leoliu avatar Feb 03 '25 14:02 leoliu

This is definitely a bug, but it makes me think of the constraint. It seems to be not clearly defined.

cometkim avatar Feb 04 '25 10:02 cometkim

This is broken, too:

type t = {\"🎉": int}

let x = {
  \"🎉": 42,
}

https://rescript-lang.org/try?version=v12.0.0-alpha.10&module=esmodule&code=C4TwDgpgBMULxQN4B0BEgeDcJH7qBcUCWAdsAL4BQZANhLAB7xJlRRpa5QAsATADRnlA

cknitt avatar May 13 '25 15:05 cknitt

And this even more!

let \"🎉" = 42

https://rescript-lang.org/try?version=v12.0.0-alpha.10&module=esmodule&code=DYUwLgBAOgRIPBuEj9mEC8EAsAmAUEA

cknitt avatar May 13 '25 15:05 cknitt

And this even more!

let "🎉" = 42 https://rescript-lang.org/try?version=v12.0.0-alpha.10&module=esmodule&code=DYUwLgBAOgRIPBuEj9mEC8EAsAmAUEA

Should we even allow this?

shulhi avatar May 13 '25 22:05 shulhi

Should we even allow this?

In the definition of the exotic identifier.. yes. As they are legit identifier names in JS 😅

cometkim avatar May 14 '25 05:05 cometkim

Ok, so this is not valid JS:

Image

So there should be compile error for this: https://rescript-lang.org/try?version=v12.0.0-alpha.10&module=esmodule&code=DYUwLgBAOgRIPBuEj9mEC8EAsAmAUN0kBbAQwGsRUIAKAP1kRgEpUA+CAJRCIGMwA6AZzAAnAJYA7AOaU6SBtiA

However, this is valid JS:

Image

So this should be compiled correctly: https://rescript-lang.org/try?version=v12.0.0-alpha.10&module=esmodule&code=C4TwDgpgBMCMUF4oG0DEAiQvBuAC99UA+UqIAugFBkA2EwUAHrAFwzxIY7oWiQwBMiUAN4AddIB4NwJH76ZgEsAdsAC+FarTr8kgslCijJ0qABZeAGjLKgA

cknitt avatar May 14 '25 14:05 cknitt

Hmm, weird. I thought any Unicode sequences were allowed in identifier names since ES6. https://mathiasbynens.be/notes/javascript-identifiers-es6

And saw some toy projects like https://github.com/Thomas101/emoji-js

I can still find records saying it was supported, and all the LLMs are convinced it's still supported, but in reality it looks like it's not supported in any JS engines?

cometkim avatar May 15 '25 12:05 cometkim

I found specific Unicode properties, ID_Start and ID_Continue, to restrict the range of Unicode in identifier names.

  • https://tc39.es/ecma262/#prod-IdentifierStart
  • https://github.com/dtolnay/unicode-ident
  • https://github.com/oxc-project/unicode-id-start

cometkim avatar May 15 '25 15:05 cometkim

Not sure its perf and size, but maybe we can check in the parser

https://github.com/dbuenzli/uucp/blob/master/src/uucp__id.ml

cometkim avatar May 15 '25 15:05 cometkim