stdlib icon indicating copy to clipboard operation
stdlib copied to clipboard

Add `bit_array.to_string_utf16` and `to_string_utf32`

Open GearsDatapacks opened this issue 8 months ago • 8 comments

Some binary protocols (for example ID3) use UTF-16 encoded text, which right now must be manually converted to a string. It could be useful to have bit_array.to_string_utf16 function to perform this conversion. to_string_utf32 is not something that I've personally needed, but could be useful as well.

Additionally it could be helpful to be able to specify endianness as that is something which varies across standards.

GearsDatapacks avatar May 18 '25 15:05 GearsDatapacks

That's a good motivation, thank you. Let's do it.

lpil avatar May 19 '25 12:05 lpil

How should we handle endianness? We could have an Endianness type as the second argument? The other option would be separate functions for big/little endian

GearsDatapacks avatar May 19 '25 12:05 GearsDatapacks

Oh I didn't think about that. Does it not matter for utf8, or are we just missing that?

lpil avatar May 19 '25 12:05 lpil

For utf-8, each code unit is one byte, so endianness doesn't apply. It only matters for UTF-16 and UTF-32 where code units are more than one byte

GearsDatapacks avatar May 19 '25 13:05 GearsDatapacks

Would we want an option to use the current platform's endianness also?

lpil avatar May 19 '25 13:05 lpil

I'm not sure. Where I encountered it in ID3, there is always a preceding Byte-Order Mark which denotes the endianness of the encoded text. It's possible that it will be necessary though. If we go down the custom type route, we would need to add the native option before V1 as adding it later would be a breaking change.

GearsDatapacks avatar May 19 '25 13:05 GearsDatapacks

Could the endianess detection come from gleam_erlang and gleam_javascript if user-land developers decide to use that and here you would simply pass in an enum or bool?

Edit (the docblock could also suggest to use gleam_erlang or gleam_javascript to get the endianess, I did not mean any direct dependency obviously)

inoas avatar May 19 '25 13:05 inoas

That's also possible

GearsDatapacks avatar May 19 '25 13:05 GearsDatapacks