python-bloomfilter icon indicating copy to clipboard operation
python-bloomfilter copied to clipboard

BloomFilter (and ScalableBF) should support set operations (intersection, union ... )

Open stevvooe opened this issue 16 years ago • 2 comments

I am not particularly sure that the backend algorithm can reliably support this, but it would be nice if the BloomFilter implementation could support the operations of the set primitive. Specifically:

>>> a = BloomFilter(10)
>>> a.add(1)
>>> b = BloomFilter(10)
>>> b.add(2)
>>> c = a | b
>>> 1 in c
True
>>> 2 in c
True
>>> 3 in c
False

I find sets to be incredibly useful in general coding practice, as it allows one to work naturally on data. Having a proxy to a set of data, without having to load said data, would increase the size of problems that one could easily work on, especially when one is more interested in set membership than the actual members.

This is just a feature suggestion (not even a request really), so no hurry. If I have time, I might even write a patch. I am unfamiliar with the hash function generation and the nuances there, so thats why I haven't jumped right into it.

stevvooe avatar Jan 12 '10 00:01 stevvooe

I just pushed a new branch (set-operations) that adds union and intersection to BloomFilter. Take a look and see what you think. I'm still looking into adding the support into SBF, but that's a bit more of a Harder Problem.

I'm gonna get this code reviewed and checked into master ASAP.

Jay

jaybaird avatar Mar 11 '11 19:03 jaybaird

Ok, this was kind of a junk patch. It requires that your bloom filters be the same capacity and error rate.

I'm working around that now. Sorry about that.

jaybaird avatar Mar 11 '11 19:03 jaybaird