RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.

Open samuelstevens opened this issue 3 years ago • 1 comments

I normally just throw everything into accelerator.prepare that has any tensor buffers or parameters. However, if an object subclasses torch.nn.Module, has no parameters and we are in a multi-GPU context, then accelerator.prepare tries to wrap it in DistributedDataParallel, which throws an error:

RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.

The quick fix is to simply move the object from accelerator.prepare to obj.to(accelerator.device) but it's nice to be able to leave everything in accelerator.prepare so I can add parameters, turn requires_grad on/off and not have to move code around.

I can open a new PR to fix this; I imagine it's just checking if any of the model.parameters() have require_grad set to true. If they don't then it's a no-op because it's already moved to accelerator.device.

https://github.com/huggingface/accelerate/blob/693d46826e32507376d44f99967df4710886c984/src/accelerate/accelerator.py#L781-L783

Oct 14 '22 16:10 samuelstevens

Thanks for the very clear issue! Your proposed fix sounds right, do you want to open a PR with it?

Oct 14 '22 16:10 sgugger