stable-diffusion.cpp
stable-diffusion.cpp copied to clipboard
VAE decode takes a long time with free_params_immediately set to false
For some reason, it takes about 4-5 minutes to run the VAE decode on a generated image (txt2img) when free_params_immediately is set to false, whereas if its set to true it takes 3-5 seconds (on GPU).
This occurs with keep_vae_on_cpu set to true and false, and doesn't occur when using taesd.
Is this expected behaviour or is there something wrong? For reference, i tried an SDXL turbo model, with and without VAE fix, no loras, any rng, any scheduler, and on cuda