Results 7 issues of Jingyue Wu

The description on the added compile option explains what this optimization does. This optimization is disabled by default for now. I'll try to enable it by default or even always...

#57's description and https://github.com/Lightning-AI/lit-thunder-LEGACY/pull/2480#issuecomment-2013537240 have the context. The proposed improvement is to properly propagate graph-not-supported errors from the cudnn backend to the frontend as distinguishable exceptions. This way, we can...

enhancement
cudnn

[Benchmark results](https://gist.github.com/wujingyue/ef92da74ba519987a4a4c764865dd481) don't look good enough at this moment to merge. Highlights: - test_nanogpt_layer_norm[forward-thunder] - test_litgpt_qkv_split_rope for phi-2 Lowlights: - test_nanogpt_gpt2[inference-thunder] - test_llama_2_7b_hf[inference-thunder] - test_llama_2_7b_hf[forward-thunder] - test_llama2_causal_self_attention_7b[inference-thunder] - test_llama2_causal_self_attention_7b[forward-thunder] -...

nvfuser

Instead, check whether the script is under nsys via `NSYS_PROFILING_SESSION_ID`. Note that it's still possible to profile warmup iterations -- just don't specify `--capture-range cudaProfilerStart` in the `nsys` command. This...

This gives a fair comparison between eager and other modes. The constraints mentioned in the comment seem to have been fixed by https://github.com/pytorch/pytorch/pull/161407 `python thunder/benchmarks/benchmark_inference.py` at head runs fine on...