paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Bug] ArrayIndexOutOfBoundsException when insert data into partitioned table by hive engine on mr.

Open gnailJC opened this issue 1 year ago • 2 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Paimon version

Paimon 0.7 Hive 3.1.3 (On MR)

Compute Engine

Hive

Minimal reproduce step

  1. create partitioned table
CREATE TABLE mydb.my_tbl (
  `label` STRING,
  `source` STRING,
  `sql_id` STRING,
  `score_type` STRING,
  `score` INT,
  `etc` STRING
) PARTITIONED BY (
  `run_time` DATE
) 
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler' 
LOCATION 'oss://path/to/table' 
TBLPROPERTIES (
  'primary-key' = 'label,source,score_type,run_time'
)
;
  1. insert data
INSERT INTO data_valley.mark_source_stats_test PARTITION(run_time='1970-01-01') VALUES(
  'contact', 'test', 'V0.1', '数量分', 1, 'common'
);
  1. Exception stack
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:204)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:568)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
        ... 9 more
Caused by: java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 6
        at org.apache.paimon.hive.mapred.PaimonRecordWriter.write(PaimonRecordWriter.java:70)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1003)
        at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:995)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928)
        at org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:133)
        at org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:45)
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:110)
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFInline.process(GenericUDTFInline.java:64)
        at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
        at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:995)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
        at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:995)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
        at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:153)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
        ... 10 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 6
        at org.apache.paimon.data.GenericRow.isNullAt(GenericRow.java:131)
        at Projection$41.apply(Unknown Source)
        at org.apache.paimon.table.sink.RowPartitionKeyExtractor.partition(RowPartitionKeyExtractor.java:44)
        at org.apache.paimon.table.sink.RowKeyExtractor.partition(RowKeyExtractor.java:57)
        at org.apache.paimon.table.sink.TableWriteImpl.toSinkRecord(TableWriteImpl.java:147)
        at org.apache.paimon.table.sink.TableWriteImpl.writeAndReturn(TableWriteImpl.java:125)
        at org.apache.paimon.table.sink.TableWriteImpl.write(TableWriteImpl.java:116)
        at org.apache.paimon.hive.mapred.PaimonRecordWriter.write(PaimonRecordWriter.java:68)

What doesn't meet your expectations?

insert data successfully.

Anything else?

No response

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

gnailJC avatar Apr 10 '24 07:04 gnailJC

@zhuangchong hi,do you have any ideas about this issue?

gnailJC avatar Apr 12 '24 03:04 gnailJC

Currently hive insert partition and insert overwrite statements are not supported. You can use Flink/Spark insert first. I need to spend some time to see how hive obtains the partition value.

zhuangchong avatar Apr 12 '24 06:04 zhuangchong