superbenchmark
superbenchmark copied to clipboard
V0.6.0 Release Plan
Release Manager
@cp5555
Endgame
- [x] Code freeze: August 22nd
- [x] Bug Bash date: August 22nd
- [ ] Release date: September 1st
Main Features
SuperBench Improvement
-
- [x] Support running on host directly without Docker (#358, #362)
-
- [x] Support running
sbcommand inside docker image (#356)
- [x] Support running
-
- [x] Support ROCm 5.1.1 (#353, #354)
-
- [x] Support ROCm 5.1.3 (#361)
-
- [x] Fix bugs in data diagnosis (#355)
-
- [x] Fix cmake and build issues (#360)
-
- [x] Support automatic configuration yaml selection on Azure VM (#365)
-
- [x] Refine error message when GPU is not detected. (#368)
-
- [x] Add return code for Timeout (#383)
-
- [x] Update Dockerfile for NCCL/RCCL version, tag name, and verbose output. (#371)
-
- [x] Support node_num=1 in mpi mode (#372)
-
- [x] Update Python setup for require packages (#387)
-
- [x] Enhance parameter parsing to allow spaces in value (#397)
-
- [x] Support NO_COLOR for SuperBench output (#404)
Micro-benchmark Improvement
-
- [x] Fix issues in ib loopback benchmark (#369)
-
- [x] Fix stability issue in ib loopback benchmark (#386)
Distributed Benchmark Improvement
-
- [x] Pair-wise IB benchmark (#363)
-
- [x] Bug Fix in IB benchmark (#370, #375, #377, #396)
-
- [x] Topology-aware IB benchmark (#373, #381)
Data Diagnosis & Analysis
-
- [x] Add failure check function in data_diagnosis.py (#378)
-
- [x] Support Json and Jsonl in Diagnosis. (#388)
-
- [x] Add support to store values of metrics in data diagnosis. (#392, #399)
-
- [x] Support exit code of sb result diagnosis (#403)
Backlog
Inference Benchmark Improvement
- Support VGG, LSTM, and GPT-2 small in TensorRT Inference Backend
- Support VGG, LSTM, and GPT-2 small in ORT Inference Backend
- Support more TensorRT parameters (Related to #366)
Data Diagnosis & Analysis
- Support boxplot and outlier analysis
Document
- Metric Reasoning Doc