Ismayil Ismayilov comments

Results 10 comments of


                                            Ismayil Ismayilov

RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph.

I am facing the same error on `torch 2.5.0+cu124`. The error is preceded by the following warning: ``` cuDNN SDPA backward got grad_output.strides() != output.strides() ``` I'm on an H100,...

RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph.

Seems to be NVIDIA/cudnn-frontend#75 and https://github.com/NVIDIA/cudnn-frontend/issues/78

🐛 [Bug] Can't load UNet on H100 after compiling ExportedProgram with torch_tensorrt.dynamo.compile and saving

@lanluo-nvidia Thank you for the reply. That is very strange. I will try with today's nightly and report back. Also, I am running this on an H100, could that possibly...

🐛 [Bug] Can't load UNet on H100 after compiling ExportedProgram with torch_tensorrt.dynamo.compile and saving

I tried again with today's nightly (`torch_tensorrt==2.5.0.dev20240918+cu124`, `torch==dev20240912+cu124`) and I am encountering the same runtime error. Additionally, the results for the compiled UNet match the original UNet. At this point,...

🐛 [Bug] Can't load UNet on H100 after compiling ExportedProgram with torch_tensorrt.dynamo.compile and saving

I also tried with release 2.4. There, I can successfully save and load the model but the compiled model outputs are full of nans. In general, Stable Diffusion with Torch-TensorRT...

🐛 [Bug] Can't load UNet on H100 after compiling ExportedProgram with torch_tensorrt.dynamo.compile and saving

@lanluo-nvidia After loading the UNet, I first check if the results match (`expected_outputs_unet` is defined in the previous code block) ```py with torch.inference_mode(): tensorrt_outputs_unet = loaded_unet(*arg_inputs_unet) for expected_output, tensorrt_output in...