RobustVideoMatting icon indicating copy to clipboard operation
RobustVideoMatting copied to clipboard

About Distributed Training

Open tayton42 opened this issue 3 years ago • 3 comments

Thank you for your research.I have a question about single multi-card training, when my code starts to self.model_ddp = DDP(self.model, device_ids=[self.rank], broadcast_buffers=False, find_unused_parameters=True) Processes on other GPUs appear on GPU0, they have the same PID, this causes GPU0 memory overflow, I can't find the cause and solution, please help me.Thanks! image

tayton42 avatar Apr 13 '23 09:04 tayton42

try to use torchrun.

DommyWorld avatar Apr 14 '23 02:04 DommyWorld

try to use torchrun.

Thank you for your answer!But I am not familiar with torchrun.Can you tell me how I should modify the RVM code?thanks anyway!!

tayton42 avatar Apr 14 '23 08:04 tayton42

Hi. I got the same problem. Have you find the solution to this problem yet? It would help me a great deal if you could share your experience here. Thank you!

Stephen-K1 avatar Oct 30 '23 07:10 Stephen-K1