Add support for Bloom Filters
Is your feature request related to a problem? Please describe. Bloom filters are essential to check for existence, and having support for that is essential for real-world usecases.
Describe the solution you'd like
Add commands BINIT, BADD, and BEXIST to initialize, add, and check existence within the bloom filter. BINIT can take in arguments around tolerance and initial allocation.
Some points to consider
- the size of the bloom filter
- seamless resize
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=bloom+filter&btnG=
interested to take a crack at this. and have a few queries here.
- curious to understand what hash functions are we thinking to use since it needs to be uniformly distributed and independent, would murmur be a gtg choice for the current use case?
- what does seamless resize infer, is it related to extending and decreasing the bloom filter size dynamically post
BINIT?
Hey @rak3n I am looking into this at the moment and have started working on this. There are still some ambiguities in the implementation and hence I am going through other open source Bloom implementations to converge.
Hi, just stumbled upon this. I'd also interested in contributing to this issue if possible. Just wanted to share a reference implementation which go-ethereum uses. Majority of code lies here. Happy to help with any benchmarks if required :)
@manav2401 Thanks for sharing the reference, but we cannot just take this code (though open source). Also, there is a need to draft a solid specification around this.
Can you prepare a quick document on what you envision from a bloom filter and the feature? We will see what fits into the scheme and then work on the implementation.
@arpitbbhayani sorry was away last week so couldn't respond.
Sure, let gather some more context on the same and prepare a document.
Hey @arpitbbhayani have jotted down few points in this document. Let me you what you think about it and if there's anything specific to be added/removed from it.
https://manavdarji.notion.site/Bloom-filters-in-DiceDB-6da85a9d2dba41b5afd94f34a74463f8
@manav2401 this is great. Are you available at 11pm tonight? We can hop on a call and close this.
Discord is where we meet. Just drop me a message.
If the time does not work, drop me a few suitable time and we can sync then.
@arpitbbhayani 11 pm tonight (IST) works for me. Dropped a message on twitter.
This was completed with #99