Riccardo Felluga comments

Results 73 comments of


                                            Riccardo Felluga

Thunder + Inductor gives OOM for stablecode-completion-alpha-3b model from LitGPT

Quick update, we probably have multiple issues with memory, however I think with this issue the focus should be on the peak memory usage. For comparison here is the memory...

Thunder + Inductor gives OOM for stablecode-completion-alpha-3b model from LitGPT

This issue seems to be another facet of #256 and #446. After further investigation it seems that this extra memory usage comes also from splitting `torch.nn.functional.gelu` between the TorchCompile and...

Thunder + Inductor gives OOM for stablecode-completion-alpha-3b model from LitGPT

Update on the CSE issue, unfortunately it didn't help with memory usage :(

Thunder + Inductor gives OOM for stablecode-completion-alpha-3b model from LitGPT

Update on this issue, as of today thunder runs `stablecode-completion-alpha-3b` with compile option `thunder_inductor` successfully. However, with `thunder_inductor_cat` it OOMs. Stats from `python thunder/benchmarks/benchmark_litgpt.py --model_name stablecode-completion-alpha-3b --compile thunder_inductor` ```shell Model...

add TensorBase.is_complex

Picking this up since a month has passed without updates #644

Add more nvFuser debug information

Thanks for the comment! To see what mechanism I came up with before you had a chance to comment, please check out the linked PR #388. With this added context,...

Add more nvFuser debug information

After further inspection @IvanYashchuk I still think this are two sightly different things. In the PR the output is a ready to run python code for the fusion and the...

Add support for FP8E4M3 and FP8E5M2 dtypes

Sounds good! How do we want to deal with the variants for each fp8 type present in torch? At the moment torch implements the following fp8 types: ``` python #...

benchmarking — create a notebook showing how to work with the single gpu benchmarks

Regarding the debug for a specific fusion I've created #387. If that goes through I will show in this notebook how to use that information to dump it for specific...

Automated benchmarking and reporting

Sure! I've updated the description with more info