latent-diffusion
latent-diffusion copied to clipboard
why predict start from noise?
https://github.com/CompVis/latent-diffusion/blob/5a6571e384f9a9b492bbfaca594a2b00cad55279/ldm/models/diffusion/ddpm.py#L1060
model_out = self.apply_model(x, t_in, c, return_ids=return_codebook_ids)
x_recon = self.predict_start_from_noise(x, t=t, noise=model_out)
model_mean, posterior_variance, posterior_log_variance = self.q_posterior(x_start=x_recon, x_t=x, t=t)
I am confused with the x_start, it should be x_0. Why not use the original img = torch.randn(shape, device=device) as the x_0, but predict the x_start from the noise every timestep?