Train SR model with Text-to-Image latents

Open DavidHiggis opened this issue 3 years ago • 0 comments

I noticed the SR / Super-resolution model takes [1,3,H,W] input (original pixels?).

And before-first_stage_model txt2img output is [1,4,H>>3,W>>3].

Is it possible to get the SR model works with [1,4,H>>3,W>>3] latent? If it requires re-train, which one (first_stage_model and diffusion_model) needs it? or both?

Aug 27 '22 05:08 DavidHiggis