stablediffusion icon indicating copy to clipboard operation
stablediffusion copied to clipboard

[stable-diffusion-x4-upscaler] Use pretrain VAE to encode a 512x512 image to latent space get nan, the image has been normalized to [-1,1]

Open leeruibin opened this issue 2 years ago • 6 comments

I have downloaded the stable-diffusion-x4-upscaler pre-train model from https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler

I try to fine-tune the upscaler model with my own data, however, I find when I encode the 512x512 image to latent space 128x128 with the pretrain VAE parameter, I get nan with size [b,4,128,128].

I have tracked the VAE forward function. I find that following the calculation map, the data will soon become huge and data overflow will happen.

image

I use the stable diffusion fine-tuning script in the following link and modify the script with my own dataset since there is no finetuning script for this x4-upscaler model. https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py

Is there any solution for this error?

leeruibin avatar Feb 26 '23 14:02 leeruibin

Hi @leeruibin , same here, did you solve this problem?

sczhou avatar May 01 '23 01:05 sczhou

@leeruibin , I met same problem, did you know how to solve this problem?

vipzhe avatar May 24 '23 12:05 vipzhe

No half precision works

vipzhe avatar May 25 '23 06:05 vipzhe

Hi, any one fine-tuned the upscale model successfully?

Harperrrr111 avatar May 27 '23 08:05 Harperrrr111

Same issue. Looking for some guidance on finetuning 4x upscale model

oubotong avatar Jul 07 '23 06:07 oubotong