spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-39921][SQL] SkewJoin--Stream side skew in BroadcastHashJoin

Open wang-zhun opened this issue 3 years ago • 3 comments

What changes were proposed in this pull request?

Solve the data skew on the stream side in BroadcastHashJoin

  • When data skew needs to introduce additional shuffle, support forcibly solve the data skew problem through spark.sql.adaptive.forceOptimizeSkewedJoin
  • If data skew optimization is performed, LocalShuffle optimization will not be performed, otherwise the skew optimization will not take effect.

Why are the changes needed?

In the actual production environment, data skew will slow down the task execution time 1 2 After solving the data skew 4 3

How was this patch tested?

UTs

wang-zhun avatar Aug 04 '22 11:08 wang-zhun

Can one of the admins verify this patch?

AmplabJenkins avatar Aug 04 '22 18:08 AmplabJenkins

you do not use AQE ?

CHzxp avatar Aug 05 '22 07:08 CHzxp

you do not use AQE ?

Turning off AQE will be a SortMergeJoin, we need to turn on AQE and solve the data skew

wang-zhun avatar Aug 05 '22 10:08 wang-zhun

so, i guess turning off AQE is aim to force execute broadcastHashJoin? because AQE will change plan to SortMergeJoin according to stage statistics, and AQE will solve the stream side skew by add external shuffle

CHzxp avatar Aug 15 '22 10:08 CHzxp

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Nov 24 '22 00:11 github-actions[bot]

@wang-zhun I also found this problem. It seems that the tilt problem cannot be handled normally in the dynamic broadcast join. Does this pr community plan to merge? The existing method can only close the dynamic join

thomasg19930417 avatar Jul 21 '23 01:07 thomasg19930417

@wang-zhun I also found this problem. It seems that the tilt problem cannot be handled normally in the dynamic broadcast join. Does this pr community plan to merge? The existing method can only close the dynamic join

@thomasg19930417 At the moment, we haven't received feedback from the community. You can create a new pull request to address this.

wang-zhun avatar Jul 21 '23 04:07 wang-zhun