iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Uppercased schemas are not readable in Iceberg-mr/ hive

Open HotSushi opened this issue 5 years ago • 5 comments

I wrote a simple test for reading uppercased schema in iceberg-mr, but it fails.

The schema is as follows

Schema(
    required(1, "Data", Types.StructType.of(
        required(2, "Case1", Types.BooleanType.get())
))

If you run simple Select * from table query with hiverunner, it fails because of following error:

java.lang.RuntimeException: cannot find field data from [org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector$IcebergRecordStructField@f45265b5]
	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:523)
	at org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldRef(IcebergRecordObjectInspector.java:68)
	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
	at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1033)
	at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1059)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:75)
	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366)
	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556)
	at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508)
	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
	at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:88)

The reason for this is that ObjectInspectorUtils.getStandardStructFieldRef forcibly checks with a lowercased fieldname (i.e data) whereas IcebergRecordObjectInspector has uppercased fieldname (i.e Data).

the following workaround works but not sure if worth pursuing as all fieldnames in structs would be lowercased.

--- a/mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java
+++ b/mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java
@@ -125,7 +125,7 @@ public final class IcebergRecordObjectInspector extends StructObjectInspector {
 
     @Override
     public String getFieldName() {
-      return field.name();
+      return field.name().toLowerCase();
     }

Here's the complete test: link

HotSushi avatar Sep 11 '20 20:09 HotSushi

I thought Hive only lowercases the top level column names. Does it also lowercase the fields in structs?

omalley avatar Sep 11 '20 22:09 omalley

I thought Hive only lowercases the top level column names. Does it also lowercase the fields in structs?

Maybe this is because we use the lowercase config when we start the TableScan, and this config might lowercase the struct fields too?

pvary avatar Sep 12 '20 05:09 pvary

@pvary This error occurs even before InputSplits are formed, so TableScan is not yet configured. This error seems to occur during query compile time when it's trying to construct Select operator for Column[Data]

@omalley I think it's the other way round. If I run "select Data.Case1 from table", Hive tries to create select operator for "Column[Data].case1". So top lower columns are proper-cased but fields in structs are lowercased

HotSushi avatar Sep 14 '20 18:09 HotSushi

@edgarRd @guilload @cmathiesen Any thoughts?

Seems like StructField implementation requires getFieldName() to return lowercase name (unlike what we are doing here).

HotSushi avatar Sep 15 '20 21:09 HotSushi

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Feb 25 '24 00:02 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Mar 11 '24 00:03 github-actions[bot]