databend
databend copied to clipboard
improve the support for hive query
we could run simple query to hive tables with https://github.com/datafuselabs/opendal/issues/154, #4947, #5895 , but this's not enough. here we list the works todo to track the progress.
- [ ] use rowgroup as the hive parquet file partition unit and support query multi rowgroups in one file
- [x] support more hive primite types besides int&string, such as float. https://github.com/datafuselabs/databend/pull/6629
- [ ] support predict push down
- [ ] support hive complex types, such as struct&map&array
- [ ] fix some parquet related issues, such as delta-encoding support
- [ ] support more hive file formats, such as avro, orc, json
- [x] support list files from dirs under table location
- [x] support partitioned hive table https://github.com/datafuselabs/databend/pull/6906
- [ ] support metadata cache
- [ ] https://github.com/Xuanwo/hdrs/issues/71
There is ongoing work to support using RPC instead JNI to connect HDFS:
- hdrs (the rust native client we are using now): https://github.com/Xuanwo/hdrs/issues/71
- hdfs-rpc: https://github.com/Xuanwo/hdfs-rpc
planing to do: read partition table and read more than one rowgroup