roaring icon indicating copy to clipboard operation
roaring copied to clipboard

Investigate "sroar": Dgraph's Serialized Roaring Bitmaps

Open lemire opened this issue 4 years ago • 7 comments

Dgraph has made public their "Serialized Roaring Bitmaps" implementation at https://github.com/dgraph-io/sroar

sroar is a re-written version of Roaring Bitmaps in Go, with the aim to have equality between in-memory representation and on-disk representation. An sroar.Bitmap does not need to be marshalled or unmarshalled, as the underlying represetation is a byte slice. Therefore, it can be written to disk, brought to memory, or shipped over the network immediately.

I would encourage the Roaring community to go have a look.

lemire avatar Mar 16 '21 20:03 lemire

This has been an update with competitive benchmark results: https://github.com/dgraph-io/sroar#benchmarks

lemire avatar May 18 '21 01:05 lemire

Here are the specific claims:

4x faster 4x less memory 25x fewer allocations (-96% p50).

lemire avatar May 18 '21 01:05 lemire

I found this library https://github.com/kelindar/bitmap it is 32-bit only. It is backed by a similar in-memory storage concept.

evanoberholster avatar Jun 23 '21 01:06 evanoberholster

@evanoberholster It is likely similar to https://github.com/bits-and-blooms/bitset in that it implements an uncompressed bitset. The use cases are different.

lemire avatar Jun 23 '21 12:06 lemire

FYI Dgraph recently posted an article explaining some details of their implementation https://dgraph.io/blog/post/serialized-roaring-bitmaps-golang/

comunidadio avatar Dec 10 '21 01:12 comunidadio

Just to provide some random insight, I benchmarked migrating my teams custom storage system from this library to sroar and saw the following:

  1. We had dramatically less allocations when using the sroar library.
  2. We never keep that many bitmaps in memory at once (everything is done streaming) so I didn’t observe any difference in memory usage, although I bet if we needed to keep a lot of them in memory at once our memory usage would have dropped.
  3. The serialized format seemed to be significantly less compressed than this library (our output files grew 4x larger)
  4. I can’t remember if it was faster. I think the actual operations may have been due to way less allocations, but the increased file size killed any performance gains there.

For those reasons we decided to stick with this library :)

That’s just 1 datapoint though! In general I do think that sroars strategy of using a small number of bytes slices to represent the bitmaps and “dirty casting” them as needed is a winning one that this library could benefit from just in terms of generally reducing allocations/pointers which reduces pressure in the GC and allocator.

richardartoul avatar Jun 04 '22 05:06 richardartoul

Dgraph and I face the same situation: In-memory index with high memory usage and high gc pressure. But the implement and performance of sroar is not good enough in my opinion. So I am writing a 32bit version sroar library with the concepts of dgraph sorar.

Tony-Kwan avatar Sep 26 '22 03:09 Tony-Kwan