spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-39825][SQL] Fix PushDownLeftSemiAntiJoin push through project

Open ulysses-you opened this issue 3 years ago • 4 comments

What changes were proposed in this pull request?

This pr improve the rule PushDownLeftSemiAntiJoin that forbid push left semi/anti through project by checking:

  • probably pruned project
  • complex join condition

Why are the changes needed?

Push LeftSemi/LeftAnti through project is not always effective

  1. project is used to do column pruning, so it conflicts with ColumnPruning
  2. if the project contains complex expression and join condition reference it, it can be regression since the complex expression will be evaluated more than once

Does this PR introduce any user-facing change?

yes

How was this patch tested?

add test

The gloden file changes:

  • q8: due to we forbid push through project if join condition reference a complex expression
  • q10: due to we forbid push through project if it maybe a pruned project

ulysses-you avatar Jul 20 '22 12:07 ulysses-you

cc @cloud-fan @viirya thank you

ulysses-you avatar Jul 21 '22 04:07 ulysses-you

cc @sigmod

cloud-fan avatar Jul 21 '22 09:07 cloud-fan

BTW, does filter pushdown have the same issue?

cloud-fan avatar Jul 21 '22 09:07 cloud-fan

BTW, does filter pushdown have the same issue?

no

The conflicts issue has been hacky fixed in ColumnPruning https://github.com/apache/spark/blob/c2536a7eabd8764cbbaaff22935e19685b92f22b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L819-L829

Due to the subexpression elimination in codegen, the complex expression is not an issue too.

ulysses-you avatar Jul 21 '22 10:07 ulysses-you

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Nov 04 '22 00:11 github-actions[bot]