web-stable-diffusion icon indicating copy to clipboard operation
web-stable-diffusion copied to clipboard

Huge performance gap between TVM and TRT on Stable Diffusion v1.5

Open felixslu opened this issue 2 years ago • 1 comments

GPU: Nvidia RTX 3090TI.

  1. Firstly, I use the log db in the repo, it gives me 3.7s to get the result.
  2. Then, I tried to tuning myself using meta-schedule(with trial count set to 50,000), it gives me 2.5s.

But, on TensorRT v8.6, for one iteration of unet, it gives me only 25ms, rather than 96ms with TVM(USE_CUBLAS =ON ; USE_CUDNN =ON; CUDA Version 12.1)

I wonder why the latency gap of stable diffusion model is so huge between TVM and TensorRT. BTW, a few weeks ago, I got a different result between TVM and TRT, where my in-house model auto-tuned by TVM performs a wonderful infer latency (almost nearby TensorRT8.5).

Do you have any ideas about it? Thanks advance.

felixslu avatar Jul 18 '23 06:07 felixslu

Same for me.

Civitasv avatar Jul 18 '23 12:07 Civitasv