databend icon indicating copy to clipboard operation
databend copied to clipboard

improve the support for hive query

Open FANNG1 opened this issue 3 years ago • 2 comments

we could run simple query to hive tables with https://github.com/datafuselabs/opendal/issues/154, #4947, #5895 , but this's not enough. here we list the works todo to track the progress.

  • [ ] use rowgroup as the hive parquet file partition unit and support query multi rowgroups in one file
  • [x] support more hive primite types besides int&string, such as float. https://github.com/datafuselabs/databend/pull/6629
  • [ ] support predict push down
  • [ ] support hive complex types, such as struct&map&array
  • [ ] fix some parquet related issues, such as delta-encoding support
  • [ ] support more hive file formats, such as avro, orc, json
  • [x] support list files from dirs under table location
  • [x] support partitioned hive table https://github.com/datafuselabs/databend/pull/6906
  • [ ] support metadata cache
  • [ ] https://github.com/Xuanwo/hdrs/issues/71

FANNG1 avatar Jun 22 '22 00:06 FANNG1

There is ongoing work to support using RPC instead JNI to connect HDFS:

  • hdrs (the rust native client we are using now): https://github.com/Xuanwo/hdrs/issues/71
  • hdfs-rpc: https://github.com/Xuanwo/hdfs-rpc

Xuanwo avatar Jun 22 '22 01:06 Xuanwo

planing to do: read partition table and read more than one rowgroup

FANNG1 avatar Jul 15 '22 06:07 FANNG1