RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.
I normally just throw everything into accelerator.prepare that has any tensor buffers or parameters. However, if an object subclasses torch.nn.Module, has no parameters and we are in a multi-GPU context, then accelerator.prepare tries to wrap it in DistributedDataParallel, which throws an error:
RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.
The quick fix is to simply move the object from accelerator.prepare to obj.to(accelerator.device) but it's nice to be able to leave everything in accelerator.prepare so I can add parameters, turn requires_grad on/off and not have to move code around.
I can open a new PR to fix this; I imagine it's just checking if any of the model.parameters() have require_grad set to true. If they don't then it's a no-op because it's already moved to accelerator.device.
https://github.com/huggingface/accelerate/blob/693d46826e32507376d44f99967df4710886c984/src/accelerate/accelerator.py#L781-L783
Thanks for the very clear issue! Your proposed fix sounds right, do you want to open a PR with it?