stream-lib icon indicating copy to clipboard operation
stream-lib copied to clipboard

HyperLogLogPlusPlus sparse precision 32 accuracy problem

Open Enzo90910 opened this issue 6 years ago • 0 comments

I have had very strange results (very high inaccuracy) for low-cardinality HLL++ when using usual values of p (p = 11, 12 ,13, 14) and sp = 32. I suspect (but I am not certain) that treating sp = 31 and sp = 32 exactly the same at the following line causes the problem:

sm = sp > 30 ? Integer.MAX_VALUE : 1 << sp;

since for low cardinalities, the cardinality is computed this way: return Math.round(HyperLogLog.linearCounting(sm, sm - sparseSet.length));

Using sp=31 works as expected, sp=32 does not.

Enzo90910 avatar Aug 30 '19 10:08 Enzo90910