benchmark
benchmark copied to clipboard
Enhance TorchBench coverage for large distributed workloads with CI support for IBM Cloud
The proposed work tasks are as below:
- [ ] Enable CI support for IBM Cloud to enhance the testing infrastructure for FSDP
- [ ] Benchmark new model(s) for FSDP training - e.g. add new hf_T5 with 3B parameters, etc.
Looping @mrshenli into the discussion, since the PyTorch Distributed team is also interested in building benchmarks in distributed setup.
We tested the workflow. Closing it for now and will reopen when we decide to move further.