Add option to FilterExec to prevent re-using input batches
Which issue does this PR close?
N/A
Rationale for this change
DataFusion Comet is currently maintaining a fork of FilterExec with a small modificiation to change the way that filtered batches are created. We have a requirement that we do not want FilterExec to pass through input batches in the case where the predicate evaluates to true for all rows in a batch (due to some array re-use in our scan).
We would like to make the DataFusion implementation of FilterExec customizable to meet our needs.
What changes are included in this PR?
Add a new boolean parameter so that we can choose whether FilterExec is allowed to return unmodified input batches.
Are these changes tested?
I did not add tests yet. I wanted to get some feedback on approach first.
Are there any user-facing changes?
If the predicate evaluation is entirely true, it typically results in an array pointer copy. However, there are instances where you might want to copy the underlying data even if the predicate is entirely true, even if it degrades the performance of the operator.
Is there a use case other than Comet itself?
Marking as draft as I think this PR is no longer waiting on feedback.
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.