ClickHouse
ClickHouse copied to clipboard
Provide Bloomfilter(),HyperLogLog() type scalar functions that work like get_dict()
Use case
A probabilistic data structure like Bloomfilter is very useful for detection algorithms like malicious IPs.
For example.
- Load 1 billion IPs into a table that represent bad ips.
- Load it into the Bloomfilter from the table like you would with a clickhouse dictionary
- add the filter to a where clause: like ' WHERE get_bloomfilter('bad_ips',ip_address)'
This would use a small amount of ram and useful for a quick pre-filter of data.
While I don't know of other databases that provide it out of the box there are extensions for a few such as Postgres that provide this functionality. Because Bloomfilters are already used in the product I think I can keep this brief but am happy to provide for information.