iceberg-rust
iceberg-rust copied to clipboard
Table Scan Performance Tests
This PR adds some performance testing capabilities. It includes the following features:
- docker-compose environment that includes containers for Minio, Spark, HAProxy and the Iceberg REST Catalog
- Uses HAProxy to simulate real-world latency and bandwidth constraints of connections to services like S3
- Includes scripting to create an Iceberg table in the performance testing environment and populate it with data from the widely-used NYC Taxi dataset
- Adds a justfile for ease of creating, initialising, starting, stopping and tearing down the performance testing environment
- Adds some Criterion benchmarks that use the performance testing environment to test the performance of
TableScan.plan_filesin four different representative scenarios - Adds some Criterion benchmarks that use the performance testing environment to test the performance of
TableScan.to_arrowin four different representative scenarios
The performance tests can be set up and ran by running just perf-run. This will trigger the following actions before actually running the tests. It checks each item to see if it actually needs to be run, skipping if already done on a previous run:
- Download NYC taxi data parquets
- Spin up docker containers
- Create a table
- Insert test data from the parquets
@Xuanwo and @liurenjie1024: This is now passing and ready for review.