[SPARK-39624][SQL] Support coalesce partition through CartesianProduct
What changes were proposed in this pull request?
Coalesce paritition for every group
Why are the changes needed?
With CartesianProduct, CoalesceShufflePartitions can not optimize it.
Such as sql like this, if CoalesceShufflePartitions can not apply ,t1 join t2 will produce a lot partition, the result partition will be left partition * right partition which can be quite large.
SELECT * FROM ( SELECT * FROM t3) t3
JOIN (
SELECT t1.key, t2.value FROM
( SELECT * FROM t1) t1
JOIN
( SELECT * FROM t2) t2
ON t1.value = t2.value
) t ON t3.key = t.key OR t3.value = t.value
It's better to support partial optimize with CartesianProductExec.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add test.
Can one of the admins verify this patch?
Can you please rework your PR description, by stating what's the behavior before and after your change? That screenshot doesn't say anything here. You can remove it. Maybe use some before/after examples instead.
+1 with @maryannxue comment. Otherwise the idea looks good, also cc @cloud-fan
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!