latent-diffusion
latent-diffusion copied to clipboard
Train SR model with Text-to-Image latents
I noticed the SR / Super-resolution model takes [1,3,H,W] input (original pixels?).
And before-first_stage_model txt2img output is [1,4,H>>3,W>>3].
Is it possible to get the SR model works with [1,4,H>>3,W>>3] latent?
If it requires re-train, which one (first_stage_model and diffusion_model) needs it? or both?