V0.13.0 Release Plan

Open yukirora opened this issue 5 months ago • 0 comments

Release Manager

@cp5555

- [ ] Collect per-snapshot per-GPU flops/temp in gpu burn (#735)
- [x] Add simultanneously all-to-host / host-to-all bandwidth testcases to nvbandwidth (#736)
- [x] Add ncu profile support in cublaslt-gemm (#740)
- [x] Support verification and parallel run for disk performance benchmark (#741)
- [x] Add numa support for nvbandwidth (#742)
- [x] Change cublasLtMatmulDescCreate scaleType from CUDA_R_32F to CUDA_R_16F in FP16 dist inference (#732)
- [ ] Support gemm correctness check in cublaslt-gemm
- [ ] Multi node nccl validation enhancement
- [ ] mscclpp support
- [ ] Add new busbw metrics for NCCL/MSCCL testing with specific algorithm
- [ ] Fix NVBandwidth benchmark results parsing bug
- [ ] Support FP4 kernels for cutlass benchmark

- [ ] dist-inference raise cublaslt error
- [ ] Add --set_ib_devices option to auto-select IB device by MPI local rank in ib validation (#733)
- [ ] NVBandwidth benchmark results parsing bug (#748)
- [x] CI/CD - Fix image merge in GitHub Action (#749)
- [x] Fix pipelines - Update mlc version in dockerfiles from v3.11 to v3.12 (#752)
- [x] CI/CD - Fix python3.10 pipeline (#753)
- [x] CI/CD - Fix Azure test pipeline (#754)

Sep 18 '25 07:09 yukirora