avsc icon indicating copy to clipboard operation
avsc copied to clipboard

Replace Buffer with Uint8Array

Open valadaptive opened this issue 2 years ago • 3 comments

I still need to do some cleanups and replace uses of Buffer.compare, but I'm putting this PR here so you can benchmark it. Before I can complete this, #428 (minus the version bump) should be merged in, as the services code uses Buffer everywhere and expects fixed/bytes types to decode to Buffers.

I've made some tweaks to the benchmarking setup, so comparing this directly to the master branch won't work:

  • I've added an ArrayFloat benchmark to go along with ArrayDouble.
  • I use the --expose-gc flag when benchmarking in order to manually trigger garbage collection between benches, hopefully making results more consistent.
  • I've changed the length distribution of strings to be exponentially weighted, so shorter strings are still likely but longer strings will now be occasionally generated. The previous code was only benchmarking the manual path of Tap#writeString, since it only generated strings up to a length of 32.

I've cherry-picked those benchmarking changes into the bench-tweaks branch, which you can use to compare benchmarks.

valadaptive avatar Feb 24 '24 00:02 valadaptive

Thanks @valadaptive! I'll try to find time to merge #428.

mtth avatar Mar 04 '24 00:03 mtth

FYI @valadaptive - #428 is in.

mtth avatar Mar 30 '24 17:03 mtth

Working on removing Buffer usage from types.js now. I noticed the isJsonBuffer function, which seems to check if a given object is the JSON representation of a Buffer. Under what circumstances are Avro types directly serialized to JSON and/or parsed back directly? I can't easily polyfill Uint8Array to stringify to a regular array, so I'll probably need to insert a fixup step when stringifying/parsing.

valadaptive avatar Mar 30 '24 21:03 valadaptive

@mtth Do you intend to support the ability to roundtrip various Avro types to/from JSON? With the current representation that uses Buffers, this works mostly fine, but with Uint8Array, the JSON representation is a lot more bloated:

> JSON.stringify(Buffer.from([1, 2, 3, 4, 5]))
'{"type":"Buffer","data":[1,2,3,4,5]}'
> JSON.stringify(new Uint8Array([1, 2, 3, 4, 5]))
'{"0":1,"1":2,"2":3,"3":4,"4":5}'

I lean towards removing the coerceBuffers option entirely to discourage people from serializing Uint8Arrays to JSON.

valadaptive avatar Sep 10 '24 23:09 valadaptive

Do you intend to support the ability to roundtrip various Avro types to/from JSON?

It's nice to have, but I'm OK dropping coerceBuffers if it adds significant complexity.

mtth avatar Sep 13 '24 04:09 mtth

It's nice to have, but I'm OK dropping coerceBuffers if it adds significant complexity.

I believe it works right now in terms of recognizing Buffers. However, since we now use Uint8Arrays instead of Buffers in types, they'll serialize to much larger objects in JSON (e.g. Uint8Array([1, 2, 3, 4, 5] is serialized as '{"0":1,"1":2,"2":3,"3":4,"4":5}').

I can either try to implement code to recognize JSON'd Uint8Arrays, or remove coerceBuffers entirely. Leaving it as-is would be a huge footgun, because you can no longer round-trip Avro types through JSON (it'll serialize Uint8Arrays to a JSON representation it cannot itself recognize as a Uint8Array).

valadaptive avatar Sep 13 '24 08:09 valadaptive

Short reference to https://gist.github.com/joscha/d8603a1f0af5b0b055546c792b2a8ff6, which can be updated once this pull request lands.

joscha avatar Sep 13 '24 08:09 joscha