stable_diffusion.openvino Need OpenVino's Model Optimizer command line to generate IRs from the original model.

Hello, can you please share the steps and command line you used to convert the original model to OpenVINO IRs. Need this to help optimize these models further.

Nov 02 '22 16:11 arisha07

This would also help add local models from dreambooth, and 1.5 SD.

Nov 04 '22 00:11 iwoolf

This looked like a hint, but I couldn't get all the requirements for TensorFlow_OpenVINO\get_frozen_graph.py https://opencv.org/running-tensorflow-model-inference-in-openvino-2/ https://opencv.org/how-to-speed-up-deep-learning-inference-using-openvino-toolkit-2/

Nov 04 '22 03:11 iwoolf

@arisha07 @iwoolf https://huggingface.co/ShadowPower/waifu-diffusion.openvino ask this guy

Edit: here's the reply

https://huggingface.co/ShadowPower/waifu-diffusion.openvino/discussions/1#6370f26f3d1bd47a4ebf19a4

Hello there! , can you please share the steps and command line you used to convert the original model to OpenVINO IRs. Need this to help optimize these models further. ShadowPower 3 days ago

Since I did this a long time ago, it was necessary to use an older version of the diffusers library.

I merged the code I used into one file and put it here: https://gist.github.com/ShadowPower/1632b77626f863c860130ec4cddf20d5

The diffusers library at that time was not compatible with exporting onnx and required some modifications, a modified version of which is available here: https://github.com/harishanand95/diffusers

In fact, the newer versions of diffusers export onnx from this fork. You can also try to modify the export script to be compatible with newer versions of the diffusers library.

Nov 10 '22 18:11 ClashSAN

There is a tutorial on how to convert the model to the ONNX format and then to the IRs: https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/225-stable-diffusion-text-to-image/225-stable-diffusion-text-to-image.ipynb It works fine except it lacks the vae encoder, but it's pretty easy to add. And the format is a little bit different from what is used here and needs some tweaking. I also used it to convert models other than SD1.4, for example, SD1.5, SD2.1, openjourney. No major problems so far. Although some models couldn't be converted because of half precision.

Jan 06 '23 22:01 RedAndr

@RedAndr ty

Jan 12 '23 05:01 ClashSAN

Actually, I was wrong about the half-precision. These models could be converted too. Just need to add torch_dtype=torch.float32 in the pipe options.

Jan 27 '23 20:01 RedAndr

There is a tutorial on how to convert the model to the ONNX format and then to the IRs: https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/225-stable-diffusion-text-to-image/225-stable-diffusion-text-to-image.ipynb It works fine except it lacks the vae encoder, but it's pretty easy to add. And the format is a little bit different from what is used here and needs some tweaking. I also used it to convert models other than SD1.4, for example, SD1.5, SD2.1, openjourney. No major problems so far. Although some models couldn't be converted because of half precision.

@RedAndr Would you be able to let us know the tweaks you had to make so that it works with this implementation?

Feb 01 '23 23:02 arisha07

Frankly, I modified my version too much to find what I did at the beginning. However, it is quite simple, just run the code and you will see where the problem is. Or let me know what your error message is.

Feb 02 '23 06:02 RedAndr

Okay was able to make the required changes in stable_diffusion_engine.py to get the IRs generated from the "225-stable-diffusion-text-to-image" notebook working with this demo. Thanks @RedAndr for the guidance.

Feb 02 '23 23:02 arisha07

@arisha07 do you want to share the changes you made to get it working?

Feb 03 '23 06:02 brmarkus

When you use the IRs generated in the notebook - "225-stable-diffusion-text-to-image" with this demo.py you will get errors related to Keyerror. For example - KeyError: 'encoder_hidden_states'. Go to the stable_diffusion_engine.py and see where it is getting called from. Now when you look into unet.xml you will see that 'encoder_hidden_states' has now become 'encoder_hidden_state'. So make the changes for the keys accordingly in the code. Other such key changes are - "latent_model_input" -> "sample" "t" -> "timestep" "token" -> "input_ids"

Feb 03 '23 20:02 arisha07

@RedAndr it will be great if you could share the VAE encoder IR conversion part.

Feb 03 '23 21:02 arisha07

Sure:

@torch.no_grad()
def convert_vae_encoder_onnx(pipe:StableDiffusionPipeline, onnx_path:Path):
    """
    Convert VAE model to ONNX, then IR format.
    Function accepts pipeline, creates wrapper class for export only necessary for inference part,
    prepares example inputs for ONNX conversion via torch.export,
    Parameters:
        pipe (StableDiffusionPipeline): Stable Diffusion pipeline
        onnx_path (Path): File for storing onnx model
    Returns:
        None
    """

    class VAEEncoderWrapper(torch.nn.Module):
        def __init__(self, vae):
            super().__init__()
            self.vae = vae

        def forward(self, sample):
            latent = self.vae.encode(sample)[0].sample()
            return latent

    if not onnx_path.exists():
        vae_encoder = VAEEncoderWrapper(pipe.vae)
        text = 'a photo of an astronaut riding a horse on mars'
        text_encoder = pipe.text_encoder
        input_ids = pipe.tokenizer(
            text,
            padding="max_length",
            max_length=pipe.tokenizer.model_max_length,
            truncation=True,
            return_tensors="pt",
        ).input_ids
        with torch.no_grad():
            text_encoder_output = text_encoder(input_ids)
        image_shape = (1, 3, res_v, res_h)
        image = torch.randn(image_shape)
        t = torch.from_numpy(np.array(1, dtype=float))
        max_length = input_ids.shape[-1]
        uncond_input = pipe.tokenizer([""], padding="max_length", max_length=max_length, return_tensors="pt")
        uncond_embeddings = pipe.text_encoder(uncond_input.input_ids)[0]
        encoder_hidden_state = torch.cat([uncond_embeddings, text_encoder_output[0]])

        vae_encoder.eval()
        with torch.no_grad():
            torch.onnx.export(
                vae_encoder, (image,), onnx_path, input_names=['init_image'], output_names=['sample'],
                #dynamic_axes={"init_image": {0: "batch", 1: "channels", 2: "height", 3: "width"}},
                opset_version = opset  # onnx opset version for export
            )
        print('VAE encoder successfully converted to ONNX')

VAEE_ONNX_PATH = Path('vae_encoder.onnx')
VAEE_OV_PATH = VAEE_ONNX_PATH.with_suffix('.xml')

if not VAEE_OV_PATH.exists():
    convert_vae_encoder_onnx(pipe, VAEE_ONNX_PATH)
    print(f"mo --input_model {VAEE_ONNX_PATH} --compress_to_fp16")
    print('VAE successfully converted to IR')

Uncomment the dynamic_axes line if you need a variable resolution. opset = 16 in my case, res_v and res_h are self-explanatory.

Feb 03 '23 22:02 RedAndr

We updated the notebooks and so this demo and the notebooks will work together. i.e., the converted IR will work directly.

https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/225-stable-diffusion-text-to-image/225-stable-diffusion-text-to-image.ipynb

I also updated the new FP16 as the default and so the download is smaller and also works much much faster on GPUs. https://huggingface.co/bes-dev/stable-diffusion-v1-4-openvino

Feb 13 '23 21:02 raymondlo84