Investigate "sroar": Dgraph's Serialized Roaring Bitmaps
Dgraph has made public their "Serialized Roaring Bitmaps" implementation at https://github.com/dgraph-io/sroar
sroar is a re-written version of Roaring Bitmaps in Go, with the aim to have equality between in-memory representation and on-disk representation. An sroar.Bitmap does not need to be marshalled or unmarshalled, as the underlying represetation is a byte slice. Therefore, it can be written to disk, brought to memory, or shipped over the network immediately.
I would encourage the Roaring community to go have a look.
This has been an update with competitive benchmark results: https://github.com/dgraph-io/sroar#benchmarks
Here are the specific claims:
4x faster 4x less memory 25x fewer allocations (-96% p50).
I found this library https://github.com/kelindar/bitmap it is 32-bit only. It is backed by a similar in-memory storage concept.
@evanoberholster It is likely similar to https://github.com/bits-and-blooms/bitset in that it implements an uncompressed bitset. The use cases are different.
FYI Dgraph recently posted an article explaining some details of their implementation https://dgraph.io/blog/post/serialized-roaring-bitmaps-golang/
Just to provide some random insight, I benchmarked migrating my teams custom storage system from this library to sroar and saw the following:
- We had dramatically less allocations when using the sroar library.
- We never keep that many bitmaps in memory at once (everything is done streaming) so I didn’t observe any difference in memory usage, although I bet if we needed to keep a lot of them in memory at once our memory usage would have dropped.
- The serialized format seemed to be significantly less compressed than this library (our output files grew 4x larger)
- I can’t remember if it was faster. I think the actual operations may have been due to way less allocations, but the increased file size killed any performance gains there.
For those reasons we decided to stick with this library :)
That’s just 1 datapoint though! In general I do think that sroars strategy of using a small number of bytes slices to represent the bitmaps and “dirty casting” them as needed is a winning one that this library could benefit from just in terms of generally reducing allocations/pointers which reduces pressure in the GC and allocator.
Dgraph and I face the same situation: In-memory index with high memory usage and high gc pressure. But the implement and performance of sroar is not good enough in my opinion. So I am writing a 32bit version sroar library with the concepts of dgraph sorar.