Uppercased schemas are not readable in Iceberg-mr/ hive
I wrote a simple test for reading uppercased schema in iceberg-mr, but it fails.
The schema is as follows
Schema(
required(1, "Data", Types.StructType.of(
required(2, "Case1", Types.BooleanType.get())
))
If you run simple Select * from table query with hiverunner, it fails because of following error:
java.lang.RuntimeException: cannot find field data from [org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector$IcebergRecordStructField@f45265b5]
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:523)
at org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldRef(IcebergRecordObjectInspector.java:68)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1033)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1059)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:75)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:88)
The reason for this is that ObjectInspectorUtils.getStandardStructFieldRef forcibly checks with a lowercased fieldname (i.e data) whereas IcebergRecordObjectInspector has uppercased fieldname (i.e Data).
the following workaround works but not sure if worth pursuing as all fieldnames in structs would be lowercased.
--- a/mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java
+++ b/mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java
@@ -125,7 +125,7 @@ public final class IcebergRecordObjectInspector extends StructObjectInspector {
@Override
public String getFieldName() {
- return field.name();
+ return field.name().toLowerCase();
}
Here's the complete test: link
I thought Hive only lowercases the top level column names. Does it also lowercase the fields in structs?
I thought Hive only lowercases the top level column names. Does it also lowercase the fields in structs?
Maybe this is because we use the lowercase config when we start the TableScan, and this config might lowercase the struct fields too?
@pvary This error occurs even before InputSplits are formed, so TableScan is not yet configured. This error seems to occur during query compile time when it's trying to construct Select operator for Column[Data]
@omalley I think it's the other way round. If I run "select Data.Case1 from table", Hive tries to create select operator for "Column[Data].case1". So top lower columns are proper-cased but fields in structs are lowercased
@edgarRd @guilload @cmathiesen Any thoughts?
Seems like StructField implementation requires getFieldName() to return lowercase name (unlike what we are doing here).
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'