SAM-Adapter-PyTorch How to run on multiple machines ?

Jul 06 '23 04:07 AnnemSony

Do you mean multiple GPUs?

Jul 09 '23 01:07 tianrun-chen

I have GPU'S in multiple machine(means on node clusters), how can I run the command.

Jul 09 '23 04:07 AnnemSony

Hi , I have 4 gpus and trying to tune the SAM-Adapter model I used the command provided in git command used : CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch train.py --nnodes 1 --nproc_per_node 4 --config configs/demo.yaml

I have successed training but i found that there is only one gpu is used !! how can i solve this problem ...？(I have checked the documents of torch but don't have any idea to debug it ...? @tianrun-chen

Jul 11 '23 08:07 chusheng0505

I also encountered this problem, and only O cards were used during distributed training. At the same time, I did not find the input of these two parameters --nnodes 1 --nproc_per_node 4 in the input of train.py. Why?

Jul 23 '23 07:07 Bill-Ren

Hi , I have 4 gpus and trying to tune the SAM-Adapter model I used the command provided in git command used : CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch train.py --nnodes 1 --nproc_per_node 4 --config configs/demo.yaml

I have successed training but i found that there is only one gpu is used !! how can i solve this problem ...？(I have checked the documents of torch but don't have any idea to debug it ...? @tianrun-chen

I found a solution to the problem. Finally, I should run the code like this: CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nnodes 1 --nproc_per_node 4 train.py --config configs/demo.yaml --tag exp1 , you can check the usage of torch.distributed.launch for details

Jul 24 '23 03:07 Bill-Ren