DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

launcher/multinode_runner.py: mapping env variables

Open YizhouZ opened this issue 2 years ago • 1 comments

launcher/multinode_runner.py: mapping env variables in running cmd for mpich runner.

Previously, launching deepspeed with mpich could not properly set env variables like "RANK", "LOCAL_RANK", "WORLD_SIZE" and "LOCAL_SIZE", which deepspeed would use. They would be different names like "PMI_RANK".

Thus, we consider to set them by -genv / -env as the mpirun args. The "-genv" is used to set general env variables like "WORLD_SIZE", while the "-env" is used to set rank specific env variables like "RANK" and "LOCAL_RANK".

To simply demonstrate my change, below is an example of running cmd, only using 2 ranks: [INFO] [runner.py:540:main] cmd = mpirun -genv PYTHONSTARTUP=/.../pythonstart -genv PYTHONPATH=/../ -genv MASTER_ADDR xxx -genv MASTER_PORT xxx -genv WORLD_SIZE 2 -genv LOCAL_SIZE 2 -n 1 -host xxx -env RANK 0 -env LOCAL_RANK 0 /../bin/python -u pretrain_gpt.py ... : -n 1 -host xx -env RANK 1 -env LOCAL_RANK 1 /../bin/python -u pretrain_gpt.py ...

YizhouZ avatar Apr 25 '23 02:04 YizhouZ

@loadams Hi, could you please help me trigger the CI? My CLA was reviewed and passed today. Thank you!

YizhouZ avatar Apr 26 '23 02:04 YizhouZ