question about training diffusion-inpainting model
Hi, everyone! I'm struggling with the inpainting/outpainting, and quite confused about the input of the model in the training stage, hope can get some help😢
In diffusers/examples/research_projects/multi_subject_dreambooth_inpainting/train_multi_subject_dreambooth_inpainting.py @gzguevara and diffusers/examples/research_projects/dreambooth_inpaint/train_dreambooth_inpaint.py, the input of 9-ch inpainting model are the combination of gt(add noise), masked_img, and mask during training.
| gt imgs | masked imgs | mask |
|---|---|---|
I am curious about why GT image can be input into the unet directly. Even though it has been added with noise, it is still visible to the unet.
latents = vae.encode(batch["pixel_values"].to(dtype=weight_dtype)).latent_dist.sample()
latents = latents * vae.config.scaling_factor
masked_latents = vae.encode(batch["masked_images"].reshape(batch["pixel_values"].shape).to(dtype=weight_dtype)).latent_dist.sample()
masked_latents = masked_latents * vae.config.scaling_factor
masks = batch["masks"]
mask = torch.stack([torch.nn.functional.interpolate(mask, size=(args.resolution // 8, args.resolution // 8)) for mask in masks])
mask = mask.reshape(-1, 1, args.resolution // 8, args.resolution // 8)
noise = torch.randn_like(latents)
bsz = latents.shape[0]
timesteps = torch.randint(0, noise_scheduler.config.num_train_timesteps, (bsz,), device=latents.device)
timesteps = timesteps.long()
noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
latent_model_input = torch.cat([noisy_latents, mask, masked_latents], dim=1)
Use the images above as an example: the input is car image, and the expected output is car image during training. And when comes for infering, users can add objects to the image(eg. the input is image unrelated to car(arbitrary object or just background), and the expected output is car image.) There is a gap between training and testing.
I think it may benefits from text-guided effect, but I still have doubts. On the one hand, model needs GT to be optimized, and it is often used as a target in other generative model, rather than as a direct input to the model. On the other hand, diffusion model predict Gaussian noise, there seems to be no other way for diffusion model to be constrained from gt.
When turns to outpainting task, the gap between training and testing is bigger: If I use the combination of gt(add noise), masked_img, and mask for training, waht should I pad the image with unmasked area for infering.I don't understand how does the model avoid learning a simple mapping, I'd be grateful if anyone could give me advice.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi: you can use https://github.com/huggingface/diffusers/discussions for questions!
YiYi
Hi,I'm trying to train lora with diffusers/examples/research_projects/multi_subject_dreambooth_inpainting/train_multi_subject_dreambooth_inpainting.py,and I referred to diffusers/examples/research_projects/dreambooth_inpaint/README.md, but I encoutered some problems, such as
File "/ai/data/diffusers/examples/research_projects/dreambooth_inpaint/train_dreambooth_inpaint_lora.py", line 834, in <module>
main()
File "/ai/data/diffusers/examples/research_projects/dreambooth_inpaint/train_dreambooth_inpaint_lora.py", line 716, in main
for step, batch in enumerate(train_dataloader):
File "/ai/data/anaconda3/envs/lora/lib/python3.10/site-packages/accelerate/data_loader.py", line 384, in __iter__
current_batch = next(dataloader_iter)
File "/ai/data/anaconda3/envs/lora/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
File "/ai/data/anaconda3/envs/lora/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/ai/data/anaconda3/envs/lora/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/ai/data/anaconda3/envs/lora/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/ai/data/diffusers/examples/research_projects/dreambooth_inpaint/train_dreambooth_inpaint_lora.py", line 365, in __getitem__
example["instance_prompt_ids"] = self.tokenizer(
File "/ai/data/anaconda3/envs/lora/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2823, in __call__
raise ValueError("You need to specify either `text` or `text_target`.")
ValueError: You need to specify either `text` or `text_target`.
And diffusers==0.27.0.dev0.
I don't know how to deal with it,I'd be grateful if anyone could give me advice.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.