iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

Table Scan Performance Tests

Open sdd opened this issue 1 year ago • 1 comments

This PR adds some performance testing capabilities. It includes the following features:

  • docker-compose environment that includes containers for Minio, Spark, HAProxy and the Iceberg REST Catalog
  • Uses HAProxy to simulate real-world latency and bandwidth constraints of connections to services like S3
  • Includes scripting to create an Iceberg table in the performance testing environment and populate it with data from the widely-used NYC Taxi dataset
  • Adds a justfile for ease of creating, initialising, starting, stopping and tearing down the performance testing environment
  • Adds some Criterion benchmarks that use the performance testing environment to test the performance of TableScan.plan_files in four different representative scenarios
  • Adds some Criterion benchmarks that use the performance testing environment to test the performance of TableScan.to_arrow in four different representative scenarios

The performance tests can be set up and ran by running just perf-run. This will trigger the following actions before actually running the tests. It checks each item to see if it actually needs to be run, skipping if already done on a previous run:

  • Download NYC taxi data parquets
  • Spin up docker containers
  • Create a table
  • Insert test data from the parquets

sdd avatar Jul 28 '24 21:07 sdd

@Xuanwo and @liurenjie1024: This is now passing and ready for review.

sdd avatar Aug 13 '24 19:08 sdd