Vedaanta Agarwalla

Results 7 comments of Vedaanta Agarwalla

https://github.com/NVIDIA/cudnn-frontend/releases/tag/v1.2.1 1.2.1 was done to fix the way FE finds cudnn_backend(BE) as compared to torch. On larger note, there is a push going on right now to publish FE on...

cudnn support surface is documented here: [Developer Guide](https://docs.nvidia.com/deeplearning/cudnn/developer/graph-api.html#supported-graph-patterns) DOUBLE is not supported by any of the operations. So check_support works as expected.

Engines involving runtime compilation might cause latency. `check_support()` returns execution plans from many different engines. You can look at various graph engines used by cudnn in the [Developer Guide](https://docs.nvidia.com/deeplearning/cudnn/developer/graph-api.html#supported-graph-patterns). Refer...

Before: ``` ---------------------------------------------------------------------------------------------------- benchmark: 5 tests --------------------------------------------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_llama2_7b_sdpa_grad[thunder+cudnn] 16.1154 (1.0) 17.0821 (1.0) 16.4272 (1.0) 0.2377...

@t-vi this is ready for your final review and merge. :)

Currently cudnn executor is overly optimistic when claiming sdpa. [Link](https://github.com/Lightning-AI/lightning-thunder/blob/main/thunder/executors/cudnnex.py#L370) This change was made in #57. Before including cudnn executor in default list, the checker should be made stricter. -------------...

Thanks for the bumps, @mpatel31415 and @tfogal. tldr; It seems not be a cudnn issue? Can we get more datapoints which would point to cudnn being the culprit? ----- There...