Using VQVAE to compress and then using Diffusion Models to model latent variables cannot generate good pictures.

Open yh-xxx opened this issue 1 year ago • 0 comments

Use vae to encode x and then train vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema").to(device) x = vae.encode(x).latent_dist.sample().mul_(0.18215)

Sample with torch.no_grad(): z = ema_sample_method(opt.n_sample, z_shape, guide_w=opt.w) x_gen = vae.decode(z / 0.18215).sample The generation effect is poor image_ep400_w0 3_ema

Hope there is a solution

Oct 21 '24 08:10 yh-xxx