Introduce the JOB benchmark in GIE
Is your feature request related to a problem? Please describe.
The Join Order Benchmark (JOB) is a well-known standard for evaluating the performance of query optimizers in relational databases, focusing particularly on the effectiveness of determining optimal join orders. Integration of JOB into the GIE could provide significant insights into the optimizers' ability to handle complex join queries efficiently.
To achieve this, we need to:
- Preprocess the IMDB Dataset:
- [x] Convert the raw IMDB relational dataset into a graph-compatible format consisting of vertex and edge tables, suitable for ingestion by graph databases.
- Prepare Meta Information:
- [x] prepare meta information, includes schema and statistics for compiler.
- [x] provide the dataloading yaml for exp-store.
- [ ] provide the unified dataloading yaml for insight and interactive
- Translate JOB Queries:
- [x] Rewrite the queries from the JOB benchmark into Cypher or Gremlin, as supported by GIE.
- [x] Prepare expected results, and validate the queries
- Implement the JOB Benchmark in GIE:
- [x] Integrate the JOB benchmark into GIE's benchmarking tool
- [x] Ensuring it supports both correctness and performance testing.
- Test based on different backends
- [x] Interactive @zhanglei1949 @BingqingLyu
- [ ] Insight @siyuan0322 @BingqingLyu
- Other related issues to be addressed https://github.com/alibaba/GraphScope/issues/4114 https://github.com/alibaba/GraphScope/issues/4039
Still some issues to be addressed.
Job Benchmark has been introduced into Interactive via this commit: https://github.com/alibaba/GraphScope/commit/b814e5b6382d1d9a373e332451d62bd256abb8da. There are still 3 queries some supported.