train with 2 gpu

Open xin858466707 opened this issue 4 years ago • 3 comments

hello, i want to know how set when training on a single machine with 2 gpus?

Jan 15 '22 13:01 xin858466707

I get runtimeerror:address has already inused when I follow README steps about single node multi GPU...could you please tell me how to solve it?

Jan 19 '22 09:01 JackjackFan

Hey,dude,I have solved this problem by modifying some codes,I just adjust the way of DDP launch and I start the single node training with multi gpus successfully.But here comes a new question,how can I determine whether my network has converged to a good state or not?By checking out the loss print?Please give me a hand if you have time,thanks you.

Jan 20 '22 02:01 JackjackFan

Hi! How did you modify the code to start the single node training with multi gpus successfully?Are there any specific steps? My own attempt was unsuccessful 。。。

Jan 20 '22 14:01 xin858466707