Diffusion
Diffusion copied to clipboard
Using VQVAE to compress and then using Diffusion Models to model latent variables cannot generate good pictures.
Use vae to encode x and then train vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema").to(device) x = vae.encode(x).latent_dist.sample().mul_(0.18215)
Sample
with torch.no_grad():
z = ema_sample_method(opt.n_sample, z_shape, guide_w=opt.w)
x_gen = vae.decode(z / 0.18215).sample
The generation effect is poor
Hope there is a solution