ClickHouse icon indicating copy to clipboard operation
ClickHouse copied to clipboard

Provide Bloomfilter(),HyperLogLog() type scalar functions that work like get_dict()

Open michaelritsema opened this issue 1 year ago • 0 comments

Use case

A probabilistic data structure like Bloomfilter is very useful for detection algorithms like malicious IPs.

For example.

  • Load 1 billion IPs into a table that represent bad ips.
  • Load it into the Bloomfilter from the table like you would with a clickhouse dictionary
  • add the filter to a where clause: like ' WHERE get_bloomfilter('bad_ips',ip_address)'

This would use a small amount of ram and useful for a quick pre-filter of data.

While I don't know of other databases that provide it out of the box there are extensions for a few such as Postgres that provide this functionality. Because Bloomfilters are already used in the product I think I can keep this brief but am happy to provide for information.

michaelritsema avatar Oct 17 '24 18:10 michaelritsema