hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-26524: Use Calcite to remove sections of a query plan known never produces rows

Open kasakrisz opened this issue 3 years ago • 1 comments

What changes were proposed in this pull request?

  • Currently Hive represents the empty result operator with HiveSortLimit(fetch=0). Change this to HiveValues(tuples[]) like Calcite does.
  • Improve and extend the PruneEmptyRules provided by Calcite with Hive specific functionality.
  • Represent the empty HiveValues operator with an AST tree of the query
select null as colName0... null as colNamen limit 0

when converting back the CBO plan to AST.

  • Get the schema information from the HiveValues row type at CBO -> AST conversion.
  • Support WITHIN GROUP clause in CBO plan

Why are the changes needed?

  • Calcite has built in rules to remove sections of a query plan known never produces any rows. It makes the CBO plan much simpler.
  • In some cases (ex. select * from table1 where 1=0 ) the whole plan can be removed and Hive already has an optimization not to execute queries which does not provide any result. This optimization is built on checking the limit value at the top level query.

Does this PR introduce any user-facing change?

No, but explain results.

How was this patch tested?

mvn test -Dtest.output.overwrite -Dtest=TestMiniLlapLocalCliDriver -Dqfile=empty_result_foj_constraints.q,empty_result_outerjoin.q,empty_result.q,empty_result_union.q,sketches_materialized_view_percentile_disc.q,cbo_rp_udf_percentile.q,udaf_percentile_disc.q -pl itests/qtest -Pitests

kasakrisz avatar Sep 09 '22 07:09 kasakrisz

What I would like to understand a bit more is the part about the within group. Why do these changes need to be part of this patch? How are they related to the empty plan and the pruning rules?

Aggregates having WITHIN GROUP clause are transformed at AST level: order by keys and directions are passed as extra parameters to the aggregate function. https://github.com/apache/hive/blob/95d088a752f1c04f33c9c56556f0f939a1c9ea42/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1010

In this patch when the AST of the empty plan is generated the order direction parameters of such aggregate functions are going to be NULL values like every other value coming from the Empty Values operator. However these must be non-null integers (exactly 0 or 1). It was easier and more clean to support within group in the CBO plan and in ASTConverter generate the AST of the WITHIN GROUP clause not to loose the ordering information. The UDF implementation haven't change so when the second compiling phase is done after CBO or CBO is off the transformation is still applied.

I admit that this improvement could be an independent patch.

kasakrisz avatar Oct 04 '22 12:10 kasakrisz

It was easier and more clean to support within group in the CBO plan

Is this change gonna affect EXPLAIN CBO output?

zabetak avatar Oct 04 '22 13:10 zabetak

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug C 6 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot E 1 Security Hotspot
Code Smell A 64 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

sonarqubecloud[bot] avatar Oct 06 '22 15:10 sonarqubecloud[bot]