HIVE-26524: Use Calcite to remove sections of a query plan known never produces rows
What changes were proposed in this pull request?
- Currently Hive represents the empty result operator with
HiveSortLimit(fetch=0). Change this toHiveValues(tuples[])like Calcite does. - Improve and extend the
PruneEmptyRulesprovided by Calcite with Hive specific functionality. - Represent the empty
HiveValuesoperator with an AST tree of the query
select null as colName0... null as colNamen limit 0
when converting back the CBO plan to AST.
- Get the schema information from the
HiveValuesrow type at CBO -> AST conversion. - Support
WITHIN GROUPclause in CBO plan
Why are the changes needed?
- Calcite has built in rules to remove sections of a query plan known never produces any rows. It makes the CBO plan much simpler.
- In some cases (ex.
select * from table1 where 1=0) the whole plan can be removed and Hive already has an optimization not to execute queries which does not provide any result. This optimization is built on checking the limit value at the top level query.
Does this PR introduce any user-facing change?
No, but explain results.
How was this patch tested?
mvn test -Dtest.output.overwrite -Dtest=TestMiniLlapLocalCliDriver -Dqfile=empty_result_foj_constraints.q,empty_result_outerjoin.q,empty_result.q,empty_result_union.q,sketches_materialized_view_percentile_disc.q,cbo_rp_udf_percentile.q,udaf_percentile_disc.q -pl itests/qtest -Pitests
What I would like to understand a bit more is the part about the within group. Why do these changes need to be part of this patch? How are they related to the empty plan and the pruning rules?
Aggregates having WITHIN GROUP clause are transformed at AST level: order by keys and directions are passed as extra parameters to the aggregate function.
https://github.com/apache/hive/blob/95d088a752f1c04f33c9c56556f0f939a1c9ea42/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1010
In this patch when the AST of the empty plan is generated the order direction parameters of such aggregate functions are going to be NULL values like every other value coming from the Empty Values operator. However these must be non-null integers (exactly 0 or 1).
It was easier and more clean to support within group in the CBO plan and in ASTConverter generate the AST of the WITHIN GROUP clause not to loose the ordering information.
The UDF implementation haven't change so when the second compiling phase is done after CBO or CBO is off the transformation is still applied.
I admit that this improvement could be an independent patch.
It was easier and more clean to support within group in the CBO plan
Is this change gonna affect EXPLAIN CBO output?









