unicode-lookup icon indicating copy to clipboard operation
unicode-lookup copied to clipboard

Clarify that the UTF-16 encoding is shown as big-endian (BE)?

Open netjeff opened this issue 10 months ago • 0 comments

The details for each codepoint show encoding as UTF-8, UTF-16, and UTF-32

For example, looking at the details for U+1f607 "Smiling face with halo" shows

  • UTF-8 | 0xf0 0x9f 0x98 0x87
  • UTF-16 | 0xd83d 0xde07
  • UTF-32 | 0x0001f607

Strictly speaking the UTF-16 shown is big-endian (BE).

Should the details page be updated to clarify BE, maybe something like this?

  • UTF-16BE : 0xd83d 0xde07

Or maybe show both, something like this?

  • UTF-16BE : 0xd83d 0xde07
  • UTF-16LE : 0x3dd8 0x07de

Or maybe this is overkill, if big-endian is by far the most common seen in the wild?

And this also applies to UTF-32 which also has BE and LE forms.

I would lean towards removing UTF-32 from details page, and then show both UTF-16BE and LE, with BE shown first (I think BE is more common).

netjeff avatar Mar 11 '25 20:03 netjeff