lance icon indicating copy to clipboard operation
lance copied to clipboard

Fixed size binary encoding

Open raunaks13 opened this issue 1 year ago • 0 comments

Currently the base string encoding in the encoding tree is a binary encoding. This uses 2 IOPS, one for encoding the offsets, and one for encoding the bytes. However we can reduce this to 1 IOP for cases where the strings are of fixed size. In this case we would just encode the byte width, and the bytes. The offsets can be reconstructed directly in memory. Such a speedup would significantly help random access as well. This encoding can also apply to encode FixedSizeBinary type arrow arrays.

We can use the fixed size binary encoding for fixed size binary datatypes, as well as binary data that happens to be fixed size, and is perhaps smaller than a certain size threshold.

raunaks13 avatar Aug 09 '24 17:08 raunaks13