Concurrent data file fetching and parallel RecordBatch processing
This brings some big performance gains vs the previous sequential batch processing. On my 12-core Ryzen 9 5900X, I see all 12 cores hitting about 50% utilization.
Performance on retrieval of all the data on a full table scan in my perf testing branch for this hit 84 million rows in 7s, or over 11M rows/sec. Real world could be quite a bit faster as 50% of the CPU usage was for Minio serving up the data files.
As with the concurrent file plan PR, the concurrency config has been set to fast defaults based on testing a range of values but can be user-configured.
Performance test results, generated using the tests in https://github.com/apache/iceberg-rust/pull/497:
If I run this directly against locally hosted Minio, cutting out the HAProxy container in the stack (that is being used to introduce latency and bandwidth constraints to simulate real-world usage), I can process the same request in just under 3s, at a rate of almost 30M rows/sec