datafusion
datafusion copied to clipboard
Only update topk filter when updated filter is more selective
Hm @adriangb another thing I wondered is update_filter does seem to take only the heap of the current partition into account, as in TopK (currently at least) each partition has it's own heap (of k items).
Perhaps we can compare against the current filter and only update the expression if it is greater / more selective?
Originally posted by @Dandandan in https://github.com/apache/datafusion/issues/15770#issuecomment-2981524048
The filter is shared between TopK instances in different partitions, so it would benefit from a higher selectivity from other partitions and being earlier to filter out more files and rows.