spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-40354][SQL] Support eliminate dynamic partition for datasource v1 writes

Open ulysses-you opened this issue 3 years ago • 1 comments

What changes were proposed in this pull request?

Add a new optimizer rule: EliminateV1DynamicPartitionWrites. This new rule supports convert dynamic partition writes to statis partition writes if the dynamic column is Literal which has been optimzied by ConstantFolding for datasource table.

Why are the changes needed?

If the partition column is actually foldable, it's is same with static partition. So there is no needed to do an extra local sort to ensure the same partition values are continuous.

Besides, the dataframe write api does not support specify the static partition spec so users always do dynamic partition writes, e.g.

df.selectExpr("*", "x" as p).write.partitionBy("p").save

Does this PR introduce any user-facing change?

no, improve the performance

How was this patch tested?

add tests

ulysses-you avatar Sep 08 '22 06:09 ulysses-you

cc @cloud-fan @viirya thank you

ulysses-you avatar Sep 09 '22 04:09 ulysses-you

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Dec 19 '22 00:12 github-actions[bot]