examples
examples copied to clipboard
Does torchrun + FSDP create multiple copies of the same dataset and model?
In the example T5 training code, the main function creates a copy of the model and dataset regardless of the worker rank before passing it to FSDP. Does this mean that there are n copies of the model and dataset when running the script with torchrun and n processes?
My code is set up in a similar way as the T5 example code and the memory consumption per gpu is the same regardless of the number of torchrun processes I use, so it does seem like I am creating n copies of the model. How can I avoid this?