Accelerate tests fail on 0.6.0 with more than 1 cuda devices
Describe the bug
Running the tests from the 0.6.0 tag, the accelerate tests fail with the following stack trace:
self = <tests.test_models_unet.UNetLDMModelTests testMethod=test_from_pretrained_accelerate_wont_change_results>
@require_accelerate
@unittest.skipIf(torch_device != "cuda", "This test is supposed to run on GPU")
def test_from_pretrained_accelerate_wont_change_results(self):
model_accelerate, _ = UNet2DModel.from_pretrained(
"fusing/unet-ldm-dummy-update", output_loading_info=True, device_map="auto"
)
model_accelerate.to(torch_device)
model_accelerate.eval()
noise = torch.randn(
1,
model_accelerate.config.in_channels,
model_accelerate.config.sample_size,
model_accelerate.config.sample_size,
generator=torch.manual_seed(0),
)
noise = noise.to(torch_device)
time_step = torch.tensor([10] * noise.shape[0]).to(torch_device)
> arr_accelerate = model_accelerate(noise, time_step)["sample"]
tests/test_models_unet.py:169:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/vol/apps/python/3.7/ext_modules/pytorch/1.11.0/cuda/11.2/torch/nn/modules/module.py:1110: in _call_impl
return forward_call(*input, **kwargs)
/vol/apps/python/3.7/ext_modules/pyaccelerate/0.13.1/accelerate/hooks.py:148: in new_forward
output = old_forward(*args, **kwargs)
src/diffusers/models/unet_2d.py:231: in forward
sample = self.mid_block(sample, emb)
/vol/apps/python/3.7/ext_modules/pytorch/1.11.0/cuda/11.2/torch/nn/modules/module.py:1110: in _call_impl
return forward_call(*input, **kwargs)
/vol/apps/python/3.7/ext_modules/pyaccelerate/0.13.1/accelerate/hooks.py:148: in new_forward
output = old_forward(*args, **kwargs)
src/diffusers/models/unet_blocks.py:274: in forward
hidden_states = self.resnets[0](hidden_states, temb)
/vol/apps/python/3.7/ext_modules/pytorch/1.11.0/cuda/11.2/torch/nn/modules/module.py:1110: in _call_impl
return forward_call(*input, **kwargs)
/vol/apps/python/3.7/ext_modules/pyaccelerate/0.13.1/accelerate/hooks.py:148: in new_forward
output = old_forward(*args, **kwargs)
src/diffusers/models/resnet.py:375: in forward
hidden_states = self.norm1(hidden_states)
/vol/apps/python/3.7/ext_modules/pytorch/1.11.0/cuda/11.2/torch/nn/modules/module.py:1110: in _call_impl
return forward_call(*input, **kwargs)
/vol/apps/python/3.7/ext_modules/pyaccelerate/0.13.1/accelerate/hooks.py:148: in new_forward
output = old_forward(*args, **kwargs)
/vol/apps/python/3.7/ext_modules/pytorch/1.11.0/cuda/11.2/torch/nn/modules/normalization.py:269: in forward
input, self.num_groups, self.weight, self.bias, self.eps)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
input = tensor([[[[ 1.3619e+01, 1.4831e+01, 4.4590e+00, ..., -1.3508e+01,
-7.5876e+00, -5.8245e+00],
[...588e+01, ..., 1.0605e+01,
1.7311e+01, 1.9452e+01]]]], device='cuda:1',
grad_fn=<ToCopyBackward0>)
num_groups = 32
weight = Parameter containing:
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., ...., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:0',
requires_grad=True)
bias = Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., ....,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
device='cuda:0', requires_grad=True)
eps = 1e-05
def group_norm(
input: Tensor, num_groups: int, weight: Optional[Tensor] = None, bias: Optional[Tensor] = None, eps: float = 1e-5
) -> Tensor:
r"""Applies Group Normalization for last certain number of dimensions.
See :class:`~torch.nn.GroupNorm` for details.
"""
if has_torch_function_variadic(input, weight, bias):
return handle_torch_function(group_norm, (input, weight, bias,), input, num_groups, weight=weight, bias=bias, eps=eps)
_verify_batch_size([input.size(0) * input.size(1) // num_groups, num_groups] + list(input.size()[2:]))
> return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
E RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper__native_group_norm)
Similar error for test_from_pretrained_accelerate. All other run tests are passing.
My machine has two GPUs in it. I'm guessing accelerate is trying to take advantage of both somehow, but the code expects all inputs on the same device.
Reproduction
Simply run the tests on a machine with more than 1 CUDA device.
Logs
See above.
System Info
- diffusers version: 0.6.0
- Platform: Linux-4.14.240
- Python version: 3.7.3
- PyTorch version (GPU?): 1.11.0a0+gitbc2c6ed (True)
- Huggingface_hub version: 0.10.1
- Transformers version: 4.21.2
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: Single machine, two GPUs
- Using accelerate-0.13.1
Interesting! I'll try to find time soon to debug this
I think we haven't tested well yet with multi GPU setups. I won't have time to look into it anytime soon though I'm afraid. Gently pinging @patil-suraj @pcuenca and @williamberman here instead in case they have time
Looks interesting! Don't have immediate time but will add to my backlog
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
cc @williamberman is it still standing?
@patil-suraj yep still standing, haven't had time to look!