Riccardo Felluga
Riccardo Felluga
Quick update, we probably have multiple issues with memory, however I think with this issue the focus should be on the peak memory usage. For comparison here is the memory...
This issue seems to be another facet of #256 and #446. After further investigation it seems that this extra memory usage comes also from splitting `torch.nn.functional.gelu` between the TorchCompile and...
Update on the CSE issue, unfortunately it didn't help with memory usage :(
Update on this issue, as of today thunder runs `stablecode-completion-alpha-3b` with compile option `thunder_inductor` successfully. However, with `thunder_inductor_cat` it OOMs. Stats from `python thunder/benchmarks/benchmark_litgpt.py --model_name stablecode-completion-alpha-3b --compile thunder_inductor` ```shell Model...
Picking this up since a month has passed without updates #644
Thanks for the comment! To see what mechanism I came up with before you had a chance to comment, please check out the linked PR #388. With this added context,...
After further inspection @IvanYashchuk I still think this are two sightly different things. In the PR the output is a ready to run python code for the fusion and the...
Sounds good! How do we want to deal with the variants for each fp8 type present in torch? At the moment torch implements the following fp8 types: ``` python #...
Regarding the debug for a specific fusion I've created #387. If that goes through I will show in this notebook how to use that information to dump it for specific...
Sure! I've updated the description with more info