stablediffusion_magicmix Issue When Trying to Load My FineTuned Model

I get this error when using a dreambooth finetuned model.

RuntimeError Traceback (most recent call last) in ----> 1 magic_mix('corgi.webp', 'cat', total_steps=50, guidance_scale=12)

7 frames /usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias) 452 _pair(0), self.dilation, self.groups) 453 return F.conv2d(input, weight, bias, self.stride, --> 454 self.padding, self.dilation, self.groups) 455 456 def forward(self, input: Tensor) -> Tensor:

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

Nov 13 '22 18:11 adamfils2

Hi Adam,

the error indicates that the model is loaded with half precision (torch.cuda.HalfTensor), so you need to also pass the input with half precision.

Presumably, you can just change the lines of calling the unet:

 unet(model_input.half(), ...)

Post some details of how you are using it if you need further help.

Nov 13 '22 18:11 mpaepper

Hi @mpaepper My model is was set to half precision by the ckpt to diffusion script available on the diffusers repo. The only change I made to the notebook was the model name and auth token.

`def magic_mix(image_path, prompt, nu=0.75, total_steps=50, guidance_scale=7.5): with torch.no_grad(): input_image = Image.open(image_path).resize((512, 512)) scheduler.set_timesteps(total_steps)

# Define the details of the two phases. The first phase generates the rough layout, the second phase fine-tunes towards the prompt.
t_min = round(0.3 * total_steps)
t_max = round(0.6 * total_steps)
layout_steps = list(range(total_steps - t_max, total_steps - t_min))
fine_tune_steps = list(range(total_steps - t_min, total_steps))

# Get embeddings for the text prompt
text_input = tokenizer(prompt, padding="max_length", max_length=tokenizer.model_max_length, truncation=True, return_tensors="pt")
text_embeddings = text_encoder(text_input.input_ids.to(torch_device))[0]
max_length = text_input.input_ids.shape[-1]
uncond_input = tokenizer(
    [""], padding="max_length", max_length=max_length, return_tensors="pt"
)
uncond_embeddings = text_encoder(uncond_input.input_ids.to(torch_device))[0] 
text_embeddings = torch.cat([uncond_embeddings, text_embeddings])

encoded = pil_to_latent(input_image)
noise = torch.randn_like(encoded)
fine_tuned = None

# First phase: generate the rough layout by interpolating the original image with denoising from the prompt
for i in layout_steps:
  t = scheduler.timesteps[i]
  noisy_latents = scheduler.add_noise(encoded, noise, timesteps=torch.tensor([t]))
  if fine_tuned is not None:
    noisy_latents = nu * fine_tuned + (1-nu) * noisy_latents
  model_input = torch.cat([noisy_latents] * 2) 
  model_input = scheduler.scale_model_input(model_input, t)

  **noise_pred = unet(model_input.half(), t, encoder_hidden_states=text_embeddings).sample**

  noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
  noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)

  fine_tuned = scheduler.step(noise_pred, t, noisy_latents).prev_sample

after_layout = fine_tuned

# Second phase: fine-tune towards the prompt
for i in fine_tune_steps:
  t = scheduler.timesteps[i]
  model_input = torch.cat([fine_tuned] * 2)
  model_input = scheduler.scale_model_input(model_input, t)

  **noise_pred = unet(model_input.half(), t, encoder_hidden_states=text_embeddings).sample**

  noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
  noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)

  fine_tuned = scheduler.step(noise_pred, t, fine_tuned).prev_sample

return latents_to_pil(fine_tuned)[0]`

I set it to half like you proposed but still doesn't work @mpaepper

Nov 13 '22 20:11 adamfils2

Did the error change after you made the input to .half()?

Nov 13 '22 20:11 mpaepper