[SPARK-39921][SQL] SkewJoin--Stream side skew in BroadcastHashJoin
What changes were proposed in this pull request?
Solve the data skew on the stream side in BroadcastHashJoin
- When data skew needs to introduce additional shuffle, support forcibly solve the data skew problem through
spark.sql.adaptive.forceOptimizeSkewedJoin - If data skew optimization is performed,
LocalShuffleoptimization will not be performed, otherwise the skew optimization will not take effect.
Why are the changes needed?
In the actual production environment, data skew will slow down the task execution time
After solving the data skew

How was this patch tested?
UTs
Can one of the admins verify this patch?
you do not use AQE ?
you do not use AQE ?
Turning off AQE will be a SortMergeJoin, we need to turn on AQE and solve the data skew
so, i guess turning off AQE is aim to force execute broadcastHashJoin? because AQE will change plan to SortMergeJoin according to stage statistics, and AQE will solve the stream side skew by add external shuffle
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!
@wang-zhun I also found this problem. It seems that the tilt problem cannot be handled normally in the dynamic broadcast join. Does this pr community plan to merge? The existing method can only close the dynamic join
@wang-zhun I also found this problem. It seems that the tilt problem cannot be handled normally in the dynamic broadcast join. Does this pr community plan to merge? The existing method can only close the dynamic join
@thomasg19930417 At the moment, we haven't received feedback from the community. You can create a new pull request to address this.