benchmark icon indicating copy to clipboard operation
benchmark copied to clipboard

Enhance TorchBench coverage for large distributed workloads with CI support for IBM Cloud

Open spzala opened this issue 2 years ago • 1 comments

The proposed work tasks are as below:

  • [ ] Enable CI support for IBM Cloud to enhance the testing infrastructure for FSDP
  • [ ] Benchmark new model(s) for FSDP training - e.g. add new hf_T5 with 3B parameters, etc.

spzala avatar Apr 10 '23 14:04 spzala

Looping @mrshenli into the discussion, since the PyTorch Distributed team is also interested in building benchmarks in distributed setup.

xuzhao9 avatar Apr 11 '23 17:04 xuzhao9

We tested the workflow. Closing it for now and will reopen when we decide to move further.

spzala avatar Jul 07 '25 15:07 spzala