tispark icon indicating copy to clipboard operation
tispark copied to clipboard

Eliminate redundant aggregation function pushed down

Open Novemser opened this issue 8 years ago • 2 comments

scala> spark.sql("select count(tp_int),avg(tp_int) from full_data_type_table").explain
17/12/28 14:17:13 INFO SparkSqlParser: Parsing command: select count(tp_int),avg(tp_int) from full_data_type_table
== Physical Plan ==
*HashAggregate(keys=[], functions=[sum(count(tp_int#100L)#386L), sum(sum(tp_int#100L)#387L), sum(count(tp_int#100L)#386L)])
+- Exchange SinglePartition
   +- *HashAggregate(keys=[], functions=[partial_sum(count(tp_int#100L)#386L), partial_sum(sum(tp_int#100L)#387L), partial_sum(count(tp_int#100L)#386L)])
      +- TiDB CoprocessorRDD{[table: full_data_type_table] , Ranges: Start:[-9223372036854775808], End: [9223372036854775807], Columns: [tp_int], Aggregates: Count([tp_int]), Sum([tp_int]), Count([tp_int])}

Actually, Aggregates: Count([tp_int]), Sum([tp_int]), Count([tp_int]) could be optimized to Aggregates: Count([tp_int]), Sum([tp_int])

Novemser avatar Dec 28 '17 06:12 Novemser

How does this happen? Don't we have a PR that optimized this?

ilovesoup avatar Jan 23 '18 17:01 ilovesoup

Seems still a problem. Leave it open for now.

ilovesoup avatar Feb 12 '18 03:02 ilovesoup