diffusers Accelerate tests fail on 0.6.0 with more than 1 cuda devices

Describe the bug

Running the tests from the 0.6.0 tag, the accelerate tests fail with the following stack trace:

self = <tests.test_models_unet.UNetLDMModelTests testMethod=test_from_pretrained_accelerate_wont_change_results>

    @require_accelerate
    @unittest.skipIf(torch_device != "cuda", "This test is supposed to run on GPU")
    def test_from_pretrained_accelerate_wont_change_results(self):
        model_accelerate, _ = UNet2DModel.from_pretrained(
            "fusing/unet-ldm-dummy-update", output_loading_info=True, device_map="auto"
        )
        model_accelerate.to(torch_device)
        model_accelerate.eval()
    
        noise = torch.randn(
            1,
            model_accelerate.config.in_channels,
            model_accelerate.config.sample_size,
            model_accelerate.config.sample_size,
            generator=torch.manual_seed(0),
        )
        noise = noise.to(torch_device)
        time_step = torch.tensor([10] * noise.shape[0]).to(torch_device)
    
>       arr_accelerate = model_accelerate(noise, time_step)["sample"]

tests/test_models_unet.py:169: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/vol/apps/python/3.7/ext_modules/pytorch/1.11.0/cuda/11.2/torch/nn/modules/module.py:1110: in _call_impl
    return forward_call(*input, **kwargs)
/vol/apps/python/3.7/ext_modules/pyaccelerate/0.13.1/accelerate/hooks.py:148: in new_forward
    output = old_forward(*args, **kwargs)
src/diffusers/models/unet_2d.py:231: in forward
    sample = self.mid_block(sample, emb)
/vol/apps/python/3.7/ext_modules/pytorch/1.11.0/cuda/11.2/torch/nn/modules/module.py:1110: in _call_impl
    return forward_call(*input, **kwargs)
/vol/apps/python/3.7/ext_modules/pyaccelerate/0.13.1/accelerate/hooks.py:148: in new_forward
    output = old_forward(*args, **kwargs)
src/diffusers/models/unet_blocks.py:274: in forward
    hidden_states = self.resnets[0](hidden_states, temb)
/vol/apps/python/3.7/ext_modules/pytorch/1.11.0/cuda/11.2/torch/nn/modules/module.py:1110: in _call_impl
    return forward_call(*input, **kwargs)
/vol/apps/python/3.7/ext_modules/pyaccelerate/0.13.1/accelerate/hooks.py:148: in new_forward
    output = old_forward(*args, **kwargs)
src/diffusers/models/resnet.py:375: in forward
    hidden_states = self.norm1(hidden_states)
/vol/apps/python/3.7/ext_modules/pytorch/1.11.0/cuda/11.2/torch/nn/modules/module.py:1110: in _call_impl
    return forward_call(*input, **kwargs)
/vol/apps/python/3.7/ext_modules/pyaccelerate/0.13.1/accelerate/hooks.py:148: in new_forward
    output = old_forward(*args, **kwargs)
/vol/apps/python/3.7/ext_modules/pytorch/1.11.0/cuda/11.2/torch/nn/modules/normalization.py:269: in forward
    input, self.num_groups, self.weight, self.bias, self.eps)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

input = tensor([[[[ 1.3619e+01,  1.4831e+01,  4.4590e+00,  ..., -1.3508e+01,
           -7.5876e+00, -5.8245e+00],
          [...588e+01,  ...,  1.0605e+01,
            1.7311e+01,  1.9452e+01]]]], device='cuda:1',
       grad_fn=<ToCopyBackward0>)
num_groups = 32
weight = Parameter containing:
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., ...., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:0',
       requires_grad=True)
bias = Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., ....,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       device='cuda:0', requires_grad=True)
eps = 1e-05

    def group_norm(
        input: Tensor, num_groups: int, weight: Optional[Tensor] = None, bias: Optional[Tensor] = None, eps: float = 1e-5
    ) -> Tensor:
        r"""Applies Group Normalization for last certain number of dimensions.
    
        See :class:`~torch.nn.GroupNorm` for details.
        """
        if has_torch_function_variadic(input, weight, bias):
            return handle_torch_function(group_norm, (input, weight, bias,), input, num_groups, weight=weight, bias=bias, eps=eps)
        _verify_batch_size([input.size(0) * input.size(1) // num_groups, num_groups] + list(input.size()[2:]))
>       return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
E       RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper__native_group_norm)

Similar error for test_from_pretrained_accelerate. All other run tests are passing.

My machine has two GPUs in it. I'm guessing accelerate is trying to take advantage of both somehow, but the code expects all inputs on the same device.

Reproduction

Simply run the tests on a machine with more than 1 CUDA device.

Logs

See above.

System Info

diffusers version: 0.6.0
Platform: Linux-4.14.240
Python version: 3.7.3
PyTorch version (GPU?): 1.11.0a0+gitbc2c6ed (True)
Huggingface_hub version: 0.10.1
Transformers version: 4.21.2
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: Single machine, two GPUs
Using accelerate-0.13.1

Oct 24 '22 21:10 antoche

Interesting! I'll try to find time soon to debug this

Oct 26 '22 13:10 patrickvonplaten

I think we haven't tested well yet with multi GPU setups. I won't have time to look into it anytime soon though I'm afraid. Gently pinging @patil-suraj @pcuenca and @williamberman here instead in case they have time

Dec 20 '22 00:12 patrickvonplaten

Looks interesting! Don't have immediate time but will add to my backlog

Jan 04 '23 23:01 williamberman

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jan 29 '23 15:01 github-actions[bot]

cc @williamberman is it still standing?

Jan 30 '23 08:01 patil-suraj

@patil-suraj yep still standing, haven't had time to look!

Jan 30 '23 18:01 williamberman