tispark
tispark copied to clipboard
Eliminate redundant aggregation function pushed down
scala> spark.sql("select count(tp_int),avg(tp_int) from full_data_type_table").explain
17/12/28 14:17:13 INFO SparkSqlParser: Parsing command: select count(tp_int),avg(tp_int) from full_data_type_table
== Physical Plan ==
*HashAggregate(keys=[], functions=[sum(count(tp_int#100L)#386L), sum(sum(tp_int#100L)#387L), sum(count(tp_int#100L)#386L)])
+- Exchange SinglePartition
+- *HashAggregate(keys=[], functions=[partial_sum(count(tp_int#100L)#386L), partial_sum(sum(tp_int#100L)#387L), partial_sum(count(tp_int#100L)#386L)])
+- TiDB CoprocessorRDD{[table: full_data_type_table] , Ranges: Start:[-9223372036854775808], End: [9223372036854775807], Columns: [tp_int], Aggregates: Count([tp_int]), Sum([tp_int]), Count([tp_int])}
Actually, Aggregates: Count([tp_int]), Sum([tp_int]), Count([tp_int]) could be optimized to Aggregates: Count([tp_int]), Sum([tp_int])
How does this happen? Don't we have a PR that optimized this?
Seems still a problem. Leave it open for now.