[SPARK-39893][SQL] Push limit 1 to the aggregate's child plan if grouping expressions and aggregate expressions are foldable
What changes were proposed in this pull request?
If all group expressions are foldable, the result of this aggregate will always be OneRowRelation. And if all aggregate expressions are foldable, we can add limit 1 to the child plan.
For example:
Table tab (it can have data or empty) : create table tab(key int, value string) using parquet
Query : SELECT distinct 1001 as id , cast('2022-06-03' as date) AS DT FROM tab
After this PR:
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.LimitPushDown ===
Aggregate [1001, 2022-06-03], [1001 AS id#218, 2022-06-03 AS DT#219] Aggregate [1001, 2022-06-03], [1001 AS id#218, 2022-06-03 AS DT#219]
!+- Project [1001 AS id#218, 2022-06-03 AS DT#219] +- GlobalLimit 1
! +- Relation spark_catalog.default.tab[key#220,value#221] parquet +- LocalLimit 1
! +- Project [1001 AS id#218, 2022-06-03 AS DT#219]
! +- Relation spark_catalog.default.tab[key#220,value#221] parquet
Query:
SELECT 1001 AS id, cast('2022-06-03' as date) AS DT FROM tab group by 'x'
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.LimitPushDown ===
Aggregate [0], [1001 AS id#218, cast(2022-06-03 as date) AS DT#219] Aggregate [0], [1001 AS id#218, cast(2022-06-03 as date) AS DT#219]
!+- Relation spark_catalog.default.tab[key#220,value#221] parquet +- GlobalLimit 1
! +- LocalLimit 1
! +- Relation spark_catalog.default.tab[key#220,value#221] parquet
Why are the changes needed?
Push limit 1 to child plan to reduce unnecessary calculations
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add UT
Can one of the admins verify this patch?
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!