datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Only update topk filter when updated filter is more selective

Open Dandandan opened this issue 8 months ago • 1 comments

Hm @adriangb another thing I wondered is update_filter does seem to take only the heap of the current partition into account, as in TopK (currently at least) each partition has it's own heap (of k items).

Perhaps we can compare against the current filter and only update the expression if it is greater / more selective?

Originally posted by @Dandandan in https://github.com/apache/datafusion/issues/15770#issuecomment-2981524048

Dandandan avatar Jun 17 '25 19:06 Dandandan

The filter is shared between TopK instances in different partitions, so it would benefit from a higher selectivity from other partitions and being earlier to filter out more files and rows.

Dandandan avatar Jun 17 '25 19:06 Dandandan