database-stream-processor icon indicating copy to clipboard operation
database-stream-processor copied to clipboard

TopK and Sorting operators

Open Kixiron opened this issue 3 years ago • 2 comments

TopK (and sorting, being closely related) is a super important operator for streaming systems and databases as a whole, as is being able to sort data and so we should support both. Both operations can be done in a linear and tree-like manner, we should probably offer both for greater flexibility. (The tree-like forms consist of partitioning the input data into N streams, sorting/topk-ing each stream and then recombining them via sort/topk in a tree until a single result is reached, this can allow for partial incrementality at the cost of memory)

Kixiron avatar Jul 08 '22 20:07 Kixiron

It's not a problem to offer sort operators, what we currently don't have is support for incremental sorting. Also, when sorting the output is no longer a zset, we have to choose a different data structure; we don't have a notion of difference or addition for sorted structures, so they most likely will be non-incremental. We can do it as in DDlog, where the result is a vector, and sorting is completely outside the framework, a map operation applied to the vector - just a map.

mihaibudiu avatar Jul 08 '22 20:07 mihaibudiu

Correct, the tree-like variants are what offer some form of incrementality

Kixiron avatar Jul 08 '22 20:07 Kixiron