spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-39893][SQL] Push limit 1 to the aggregate's child plan if grouping expressions and aggregate expressions are foldable

Open wankunde opened this issue 3 years ago • 1 comments

What changes were proposed in this pull request?

If all group expressions are foldable, the result of this aggregate will always be OneRowRelation. And if all aggregate expressions are foldable, we can add limit 1 to the child plan.

For example:

Table tab (it can have data or empty) : create table tab(key int, value string) using parquet Query : SELECT distinct 1001 as id , cast('2022-06-03' as date) AS DT FROM tab

After this PR:

=== Applying Rule org.apache.spark.sql.catalyst.optimizer.LimitPushDown ===
 Aggregate [1001, 2022-06-03], [1001 AS id#218, 2022-06-03 AS DT#219]   Aggregate [1001, 2022-06-03], [1001 AS id#218, 2022-06-03 AS DT#219]
!+- Project [1001 AS id#218, 2022-06-03 AS DT#219]                      +- GlobalLimit 1
!   +- Relation spark_catalog.default.tab[key#220,value#221] parquet       +- LocalLimit 1
!                                                                             +- Project [1001 AS id#218, 2022-06-03 AS DT#219]
!                                                                                +- Relation spark_catalog.default.tab[key#220,value#221] parquet

Query: SELECT 1001 AS id, cast('2022-06-03' as date) AS DT FROM tab group by 'x'

=== Applying Rule org.apache.spark.sql.catalyst.optimizer.LimitPushDown ===
 Aggregate [0], [1001 AS id#218, cast(2022-06-03 as date) AS DT#219]   Aggregate [0], [1001 AS id#218, cast(2022-06-03 as date) AS DT#219]
!+- Relation spark_catalog.default.tab[key#220,value#221] parquet      +- GlobalLimit 1
!                                                                         +- LocalLimit 1
!                                                                            +- Relation spark_catalog.default.tab[key#220,value#221] parquet
 

Why are the changes needed?

Push limit 1 to child plan to reduce unnecessary calculations

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add UT

wankunde avatar Jul 27 '22 08:07 wankunde

Can one of the admins verify this patch?

AmplabJenkins avatar Jul 27 '22 09:07 AmplabJenkins

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Nov 07 '22 00:11 github-actions[bot]