[SPARK-39825][SQL] Fix PushDownLeftSemiAntiJoin push through project
What changes were proposed in this pull request?
This pr improve the rule PushDownLeftSemiAntiJoin that forbid push left semi/anti through project by checking:
- probably pruned project
- complex join condition
Why are the changes needed?
Push LeftSemi/LeftAnti through project is not always effective
- project is used to do column pruning, so it conflicts with ColumnPruning
- if the project contains complex expression and join condition reference it, it can be regression since the complex expression will be evaluated more than once
Does this PR introduce any user-facing change?
yes
How was this patch tested?
add test
The gloden file changes:
- q8: due to we forbid push through project if join condition reference a complex expression
- q10: due to we forbid push through project if it maybe a pruned project
cc @cloud-fan @viirya thank you
cc @sigmod
BTW, does filter pushdown have the same issue?
BTW, does filter pushdown have the same issue?
no
The conflicts issue has been hacky fixed in ColumnPruning https://github.com/apache/spark/blob/c2536a7eabd8764cbbaaff22935e19685b92f22b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L819-L829
Due to the subexpression elimination in codegen, the complex expression is not an issue too.
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!