diffusers train_inpainting_dreambooth.py RuntimeError: Input type (c10::Half) and bias type (float) should be the same

Describe the bug

I am trying to get the train_inpainting_dreambooth.py file working in Google Colab. To do this, I have adjusted DreamBooth_Stable_Diffusion.ipynb script provided so it installs and runs the train_inpainting_dreambooth.py file instead of the train_dreambooth.py file.

All the Install Requirements are installing properly, logging into Hugging face is working properly, and my sample images are uploading properly.

When I run:  

!python3 train_inpainting_dreambooth.py
--pretrained_model_name_or_path=$MODEL_NAME
--pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse"
--output_dir=$OUTPUT_DIR
--with_prior_preservation --prior_loss_weight=1.0
--seed=1337
--resolution=512
--train_batch_size=2
--train_text_encoder
--mixed_precision="fp16"
--gradient_accumulation_steps=1
--learning_rate=2e-6
--lr_scheduler="constant"
--lr_warmup_steps=0
--num_class_images=50
--sample_batch_size=4
--max_train_steps=150
--save_interval=150
--save_min_steps=150
--save_infer_steps=3
--concepts_list="concepts_list.json"
--not_cache_latents
--hflip

 it tells me that the model has been trained, however, when it tries to generate the sample images, I get this error:      Generating samples: 0% 0/4 [00:00<?, ?it/s] ╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /content/train_inpainting_dreambooth.py:876 in │ │ │ │ 873 │ │ 874 if name == "main": │ │ 875 │ args = parse_args() │ │ ❱ 876 │ main(args) │ │ 877 │ │ │ │ /content/train_inpainting_dreambooth.py:859 in main │ │ │ │ 856 │ │ │ │ accelerator.log(logs, step=global_step) │ │ 857 │ │ │ │ │ 858 │ │ │ if global_step > 0 and not global_step % args.save_interva │ │ ❱ 859 │ │ │ │ save_weights(global_step) │ │ 860 │ │ │ │ │ 861 │ │ │ progress_bar.update(1) │ │ 862 │ │ │ global_step += 1 │ │ │ │ /content/train_inpainting_dreambooth.py:758 in save_weights │ │ │ │ 755 │ │ │ │ inp_mask = Image.new("L", (512, 512), color=255) │ │ 756 │ │ │ │ with torch.inference_mode(): │ │ 757 │ │ │ │ │ for i in tqdm(range(args.n_save_sample), desc="Gen │ │ ❱ 758 │ │ │ │ │ │ images = pipeline( │ │ 759 │ │ │ │ │ │ │ prompt=concept["instance_prompt"], │ │ 760 │ │ │ │ │ │ │ image=inp_img, │ │ 761 │ │ │ │ │ │ │ mask_image=inp_mask, │ │ │ │ /usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py:115 in │ │ decorate_context │ │ │ │ 112 │ @functools.wraps(func) │ │ 113 │ def decorate_context(*args, **kwargs): │ │ 114 │ │ with ctx_factory(): │ │ ❱ 115 │ │ │ return func(*args, **kwargs) │ │ 116 │ │ │ 117 │ return decorate_context │ │ 118 │ │ │ │ /usr/local/lib/python3.9/dist-packages/diffusers/pipelines/stable_diffusion/ │ │ pipeline_stable_diffusion_inpaint.py:818 in call │ │ │ │ 815 │ │ ) │ │ 816 │ │ │ │ 817 │ │ # 7. Prepare mask latent variables │ │ ❱ 818 │ │ mask, masked_image_latents = self.prepare_mask_latents( │ │ 819 │ │ │ mask, │ │ 820 │ │ │ masked_image, │ │ 821 │ │ │ batch_size * num_images_per_prompt, │ │ │ │ /usr/local/lib/python3.9/dist-packages/diffusers/pipelines/stable_diffusion/ │ │ pipeline_stable_diffusion_inpaint.py:597 in prepare_mask_latents │ │ │ │ 594 │ │ │ ] │ │ 595 │ │ │ masked_image_latents = torch.cat(masked_image_latents, dim │ │ 596 │ │ else: │ │ ❱ 597 │ │ │ masked_image_latents = self.vae.encode(masked_image).laten │ │ 598 │ │ masked_image_latents = self.vae.config.scaling_factor * masked │ │ 599 │ │ │ │ 600 │ │ # duplicate mask and masked_image_latents for each generation │ │ │ │ /usr/local/lib/python3.9/dist-packages/diffusers/utils/accelerate_utils.py:4 │ │ 6 in wrapper │ │ │ │ 43 │ def wrapper(self, *args, **kwargs): │ │ 44 │ │ if hasattr(self, "_hf_hook") and hasattr(self._hf_hook, "pre_fo │ │ 45 │ │ │ self._hf_hook.pre_forward(self) │ │ ❱ 46 │ │ return method(self, *args, **kwargs) │ │ 47 │ │ │ 48 │ return wrapper │ │ 49 │ │ │ │ /usr/local/lib/python3.9/dist-packages/diffusers/models/autoencoder_kl.py:16 │ │ 4 in encode │ │ │ │ 161 │ │ if self.use_tiling and (x.shape[-1] > self.tile_sample_min_siz │ │ 162 │ │ │ return self.tiled_encode(x, return_dict=return_dict) │ │ 163 │ │ │ │ ❱ 164 │ │ h = self.encoder(x) │ │ 165 │ │ moments = self.quant_conv(h) │ │ 166 │ │ posterior = DiagonalGaussianDistribution(moments) │ │ 167 │ │ │ │ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1501 in │ │ _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │ │ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /usr/local/lib/python3.9/dist-packages/diffusers/models/vae.py:109 in │ │ forward │ │ │ │ 106 │ │ │ 107 │ def forward(self, x): │ │ 108 │ │ sample = x │ │ ❱ 109 │ │ sample = self.conv_in(sample) │ │ 110 │ │ │ │ 111 │ │ if self.training and self.gradient_checkpointing: │ │ 112 │ │ │ │ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1501 in │ │ _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │ │ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/conv.py:463 in │ │ forward │ │ │ │ 460 │ │ │ │ │ │ self.padding, self.dilation, self.groups) │ │ 461 │ │ │ 462 │ def forward(self, input: Tensor) -> Tensor: │ │ ❱ 463 │ │ return self._conv_forward(input, self.weight, self.bias) │ │ 464 │ │ 465 class Conv3d(_ConvNd): │ │ 466 │ doc = r"""Applies a 3D convolution over an input signal compo │ │ │ │ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/conv.py:459 in │ │ _conv_forward │ │ │ │ 456 │ │ │ return F.conv2d(F.pad(input, self._reversed_padding_repea │ │ 457 │ │ │ │ │ │ │ weight, bias, self.stride, │ │ 458 │ │ │ │ │ │ │ _pair(0), self.dilation, self.groups) │ │ ❱ 459 │ │ return F.conv2d(input, weight, bias, self.stride, │ │ 460 │ │ │ │ │ │ self.padding, self.dilation, self.groups) │ │ 461 │ │ │ 462 │ def forward(self, input: Tensor) -> Tensor: │ ╰──────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Input type (c10::Half) and bias type (float) should be the same Steps: 40% 60/150 [00:55<01:22, 1.09it/s, loss=0.162, lr=2e-6]  

I think this is a bug in the train_inpainting_dreambooth.py code. Although it is possible that I am running the code incorrectly. Any guidance would be appreciated. Thanks!

Reproduction

!python3 train_inpainting_dreambooth.py
--pretrained_model_name_or_path=$MODEL_NAME
--pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse"
--output_dir=$OUTPUT_DIR
--with_prior_preservation --prior_loss_weight=1.0
--seed=1337
--resolution=512
--train_batch_size=2
--train_text_encoder
--mixed_precision="fp16"
--gradient_accumulation_steps=1
--learning_rate=2e-6
--lr_scheduler="constant"
--lr_warmup_steps=0
--num_class_images=50
--sample_batch_size=4
--max_train_steps=150
--save_interval=150
--save_min_steps=150
--save_infer_steps=3
--concepts_list="concepts_list.json"
--not_cache_latents
--hflip

Logs

   Generating samples:   0% 0/4 [00:00<?, ?it/s]
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/train_inpainting_dreambooth.py:876 in <module>                      │
│                                                                              │
│   873                                                                        │
│   874 if _name_ == "_main_":                                             │
│   875 │   args = parse_args()                                                │
│ ❱ 876 │   main(args)                                                         │
│   877                                                                        │
│                                                                              │
│ /content/train_inpainting_dreambooth.py:859 in main                          │
│                                                                              │
│   856 │   │   │   │   accelerator.log(logs, step=global_step)                │
│   857 │   │   │                                                              │
│   858 │   │   │   if global_step > 0 and not global_step % args.save_interva │
│ ❱ 859 │   │   │   │   save_weights(global_step)                              │
│   860 │   │   │                                                              │
│   861 │   │   │   progress_bar.update(1)                                     │
│   862 │   │   │   global_step += 1                                           │
│                                                                              │
│ /content/train_inpainting_dreambooth.py:758 in save_weights                  │
│                                                                              │
│   755 │   │   │   │   inp_mask = Image.new("L", (512, 512), color=255)       │
│   756 │   │   │   │   with torch.inference_mode():                           │
│   757 │   │   │   │   │   for i in tqdm(range(args.n_save_sample), desc="Gen │
│ ❱ 758 │   │   │   │   │   │   images = pipeline(                             │
│   759 │   │   │   │   │   │   │   prompt=concept["instance_prompt"],         │
│   760 │   │   │   │   │   │   │   image=inp_img,                             │
│   761 │   │   │   │   │   │   │   mask_image=inp_mask,                       │
│                                                                              │
│ /usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py:115 in     │
│ decorate_context                                                             │
│                                                                              │
│   112 │   @functools.wraps(func)                                             │
│   113 │   def decorate_context(*args, **kwargs):                             │
│   114 │   │   with ctx_factory():                                            │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                               │
│   116 │                                                                      │
│   117 │   return decorate_context                                            │
│   118                                                                        │
│                                                                              │
│ /usr/local/lib/python3.9/dist-packages/diffusers/pipelines/stable_diffusion/ │
│ pipeline_stable_diffusion_inpaint.py:818 in _call_                         │
│                                                                              │
│   815 │   │   )                                                              │
│   816 │   │                                                                  │
│   817 │   │   # 7. Prepare mask latent variables                             │
│ ❱ 818 │   │   mask, masked_image_latents = self.prepare_mask_latents(        │
│   819 │   │   │   mask,                                                      │
│   820 │   │   │   masked_image,                                              │
│   821 │   │   │   batch_size * num_images_per_prompt,                        │
│                                                                              │
│ /usr/local/lib/python3.9/dist-packages/diffusers/pipelines/stable_diffusion/ │
│ pipeline_stable_diffusion_inpaint.py:597 in prepare_mask_latents             │
│                                                                              │
│   594 │   │   │   ]                                                          │
│   595 │   │   │   masked_image_latents = torch.cat(masked_image_latents, dim │
│   596 │   │   else:                                                          │
│ ❱ 597 │   │   │   masked_image_latents = self.vae.encode(masked_image).laten │
│   598 │   │   masked_image_latents = self.vae.config.scaling_factor * masked │
│   599 │   │                                                                  │
│   600 │   │   # duplicate mask and masked_image_latents for each generation  │
│                                                                              │
│ /usr/local/lib/python3.9/dist-packages/diffusers/utils/accelerate_utils.py:4 │
│ 6 in wrapper                                                                 │
│                                                                              │
│   43 │   def wrapper(self, *args, **kwargs):                                 │
│   44 │   │   if hasattr(self, "_hf_hook") and hasattr(self._hf_hook, "pre_fo │
│   45 │   │   │   self._hf_hook.pre_forward(self)                             │
│ ❱ 46 │   │   return method(self, *args, **kwargs)                            │
│   47 │                                                                       │
│   48 │   return wrapper                                                      │
│   49                                                                         │
│                                                                              │
│ /usr/local/lib/python3.9/dist-packages/diffusers/models/autoencoder_kl.py:16 │
│ 4 in encode                                                                  │
│                                                                              │
│   161 │   │   if self.use_tiling and (x.shape[-1] > self.tile_sample_min_siz │
│   162 │   │   │   return self.tiled_encode(x, return_dict=return_dict)       │
│   163 │   │                                                                  │
│ ❱ 164 │   │   h = self.encoder(x)                                            │
│   165 │   │   moments = self.quant_conv(h)                                   │
│   166 │   │   posterior = DiagonalGaussianDistribution(moments)              │
│   167                                                                        │
│                                                                              │
│ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1501 in    │
│ _call_impl                                                                   │
│                                                                              │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                      │
│   1502 │   │   # Do not call functions when jit is used                      │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1504 │   │   backward_pre_hooks = []                                       │
│                                                                              │
│ /usr/local/lib/python3.9/dist-packages/diffusers/models/vae.py:109 in        │
│ forward                                                                      │
│                                                                              │
│   106 │                                                                      │
│   107 │   def forward(self, x):                                              │
│   108 │   │   sample = x                                                     │
│ ❱ 109 │   │   sample = self.conv_in(sample)                                  │
│   110 │   │                                                                  │
│   111 │   │   if self.training and self.gradient_checkpointing:              │
│   112                                                                        │
│                                                                              │
│ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1501 in    │
│ _call_impl                                                                   │
│                                                                              │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                      │
│   1502 │   │   # Do not call functions when jit is used                      │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1504 │   │   backward_pre_hooks = []                                       │
│                                                                              │
│ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/conv.py:463 in       │
│ forward                                                                      │
│                                                                              │
│    460 │   │   │   │   │   │   self.padding, self.dilation, self.groups)     │
│    461 │                                                                     │
│    462 │   def forward(self, input: Tensor) -> Tensor:                       │
│ ❱  463 │   │   return self._conv_forward(input, self.weight, self.bias)      │
│    464                                                                       │
│    465 class Conv3d(_ConvNd):                                                │
│    466 │   _doc_ = r"""Applies a 3D convolution over an input signal compo │
│                                                                              │
│ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/conv.py:459 in       │
│ _conv_forward                                                                │
│                                                                              │
│    456 │   │   │   return F.conv2d(F.pad(input, self._reversed_padding_repea │
│    457 │   │   │   │   │   │   │   weight, bias, self.stride,                │
│    458 │   │   │   │   │   │   │   _pair(0), self.dilation, self.groups)     │
│ ❱  459 │   │   return F.conv2d(input, weight, bias, self.stride,             │
│    460 │   │   │   │   │   │   self.padding, self.dilation, self.groups)     │
│    461 │                                                                     │
│    462 │   def forward(self, input: Tensor) -> Tensor:                       │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Input type (c10::Half) and bias type (float) should be the same
Steps:  40% 60/150 [00:55<01:22,  1.09it/s, loss=0.162, lr=2e-6]

System Info

2023-04-04 10:36:38.116239: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

diffusers version: 0.15.0.dev0
Platform: Linux-5.10.147+-x86_64-with-glibc2.31
Python version: 3.9.16
PyTorch version (GPU?): 2.0.0+cu118 (True)
Huggingface_hub version: 0.13.3
Transformers version: 4.27.4
Accelerate version: 0.18.0
xFormers version: 0.0.18
Using GPU in script?:
Using distributed or parallel set-up in script?:

I'm not exactly sure what the last two bullet points are, sorry. However, when I run:

!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader

This is the output:

NVIDIA A100-SXM4-40GB, 40960 MiB, 40513 MiB

I am running everything in Google Colab with Notebook settings:

Hardware accelerator: GPU
CPU class: premium
Runtime shape: High RAM

Apr 04 '23 10:04 DurransEdward

same question

Apr 04 '23 13:04 laiyingxin2

I am not a pro but I have seen similar errors when there is a conflict between dtype, eg. float16 and float32. In this case since you are using mixed precision it might be conflicting with line 44 in train_dreambooth_inpaint.py. Try modifying the code on that line to float16 and do a quick few steps to test and see if this lets you generate the samples.

Apr 04 '23 21:04 jmaccall316

laiyingxin2, could you link me to the question you are talking about, please.

jmaccall316, I will try that now, thank you.

Apr 05 '23 10:04 DurransEdward