featurebase icon indicating copy to clipboard operation
featurebase copied to clipboard

Why not use BSTM to calculate TopK

Open xiehuc opened this issue 5 years ago • 3 comments

i think pilosa use cache and min heap to calculate TopK

but since pilosa use bsi heavly, and there are a paper introduced BSTM which means (Bit Slice Term Match) algorithm to calculate TopK

it would be very suitable for pilosa

https://www.researchgate.net/publication/221215144_Bit-Sliced_Index_Arithmetic

截屏2020-04-01 下午2 47 38

xiehuc avatar Apr 01 '20 06:04 xiehuc

Thanks @xiehuc for sharing. We're working to improve performance of aggregation functions. Moreover TopN implementation on integers is missed, right now. So, I think it's totally worth to consider the white-paper, which you mentioned.

kuba-- avatar Apr 01 '20 12:04 kuba--

@xiehuc see https://github.com/lemire/BitSliceIndex/tree/master/src/main/java/org/roaringbitmap/circuits/topk, the author of roaringBitmap write two type of algorithm for topK, one is RinfretONeil, one is simple by Tree

Smallhi avatar Nov 25 '20 12:11 Smallhi

that's cool, i'd see it later.

xiehuc avatar Dec 01 '20 12:12 xiehuc