algebird icon indicating copy to clipboard operation
algebird copied to clipboard

Self Learning bitmap

Open MansurAshraf opened this issue 11 years ago • 4 comments

I was wondering if someone has seen this paper from google. it is solving the same problem as HLL but is scale invariant and has better memory consumption. Below is the abstract.

Abstract: In this paper, we develop a novel approach, called self-learning bitmap (S-bitmap) that is scale-invariant for cardinalities in a specified range. S-bitmap uses a binary vector whose entries are updated from 0 to 1 by an adaptive sampling process for inferring the unknown cardinality, where the sampling rates are reduced sequentially as more and more entries change from 0 to 1. We prove rigorously that the S- bitmap estimate is not only unbiased but scale-invariant. We demonstrate that to achieve a small RRMSE value of ǫ or less, our approach requires significantly less memory and consumes similar or less operations than state-of-the- art methods for many common practice cardinality scales. Both simulation and experimental studies are reported.

screen shot 2015-01-05 at 12 07 29 pm 1

http://arxiv.org/pdf/1107.1697v1.pdf

Here is an implementation in Java : https://github.com/travisbrady/self-learning-bitmap

MansurAshraf avatar Jan 20 '15 22:01 MansurAshraf

Looks interesting, but I suspect it's not associative (or commutative).

avibryant avatar Jan 20 '15 22:01 avibryant

There doesn't seem to be any reduction possible (associative, commutative, or otherwise). It only defines:

B2 = B1 + x

not

B3 = B1 + B2

jnievelt avatar Jan 22 '15 01:01 jnievelt

Should we add things like this as Fold instances? You can't get any parallelism, but at least you can still easily run them on scalding/spark, etc...?

johnynek avatar May 05 '15 19:05 johnynek

Agreed about fold.

sritchie-stripe avatar Nov 18 '16 14:11 sritchie-stripe