HybridBackend icon indicating copy to clipboard operation
HybridBackend copied to clipboard

support ARROW_NUM_THREADS in ParquetDataset

Open karterotte opened this issue 3 years ago • 0 comments

User Story

image

hb.data.ParquetDataset can not used all of pod-cpu.

Detailed requirements

hb.data.ParquetDataset

  1. num_parallel_reads to set file reader nums
  2. **[new]**num_arrow_threads to set column reader thread nums

to accelerate model training

API Compatibility

hb.data.ParquetDataset

Willing to contribute

Yes

karterotte avatar Jul 25 '22 09:07 karterotte