[SD2.1] - Input shapes for unet model

Open lalith-mcw opened this issue 2 years ago • 0 comments

Trying to run via Openvino IR - inferencing a pixelated image currently

Input Nodes for SD2.1: sample - [2,4,64,64],timestep [-1] and encoder_hidden_states [2,77,1024]

Still I do get the inferenced image as 512x512 since vae_decoder takes latents input of shape 512x512 and that results in a pixelated image. What are the shapes used for the above three nodes for proper inferencing

Input Nodes for SD2.1: sample - [2,4,64,64],timestep [-1] and encoder_hidden_states [2,77,768]

With these inputs the output was proper for SD1.4 models also tried using the DPMSolverMultistepScheduler for SD2.1 still the output is the same.

Saw somewhere the encoder_hidden_states blob shape was updated ? What are the right dimensions to be used ?

Feb 21 '23 05:02 lalith-mcw