lance
lance copied to clipboard
Fixed size binary encoding
Currently the base string encoding in the encoding tree is a binary encoding. This uses 2 IOPS, one for encoding the offsets, and one for encoding the bytes. However we can reduce this to 1 IOP for cases where the strings are of fixed size. In this case we would just encode the byte width, and the bytes. The offsets can be reconstructed directly in memory. Such a speedup would significantly help random access as well. This encoding can also apply to encode FixedSizeBinary type arrow arrays.
We can use the fixed size binary encoding for fixed size binary datatypes, as well as binary data that happens to be fixed size, and is perhaps smaller than a certain size threshold.