hank icon indicating copy to clipboard operation
hank copied to clipboard

Bloom filters?

Open bryanduxbury opened this issue 14 years ago • 3 comments

Might be able to speed things up pretty substantially, but need to investigate thoroughly.

bryanduxbury avatar Mar 14 '11 18:03 bryanduxbury

I think Hadoop has 3 implementations of bloom filters. But otherwise, where are you wanting to plug them in hank?

gsharma avatar Apr 02 '11 15:04 gsharma

My initial thinking is to put a small bloomfilter in Cueball files that can be loaded on startup. Then, when making requests, we can check the filter first and decide whether we should do any disk access at all.

I'm also wondering if it makes sense to have one small bloomfilter for each Cueball block, rather than one big filter for all the blocks. There might be benefits to be had in terms of only hashing a portion of the keys that are not already used in partitioning and block positioning.

On Sat, Apr 2, 2011 at 8:43 AM, gsharma < [email protected]>wrote:

I think Hadoop has 3 implementations of bloom filters. But where are you wanting to plug them in hank?

Reply to this email directly or view it on GitHub: https://github.com/bryanduxbury/hank/issues/9#comment_949003

bryanduxbury avatar Apr 02 '11 20:04 bryanduxbury

I might be able to take this up in a little bit and investigate the two scenarios' performance: -small bloom filter for each cueball block -single bloom filter for all blocks

gsharma avatar Apr 11 '11 00:04 gsharma