Jingyue Wu
Jingyue Wu
Great to see the upgrade, Vedaanta! Let me know if you wrote this down somewhere: What does 1.2.1 bring to Thunder? For example, does it fix some disabled tests [here](https://github.com/Lightning-AI/lightning-thunder/blob/483c352839c16042a891625144d3ec0232d54d5a/thunder/tests/test_cudnn_executor.py#L203-L206)?
For the first action item, https://github.com/Lightning-AI/lightning-thunder/pull/206 triggers the following CI errors. ``` FAILED thunder/tests/test_examine_memory.py::test_nanogpt_block_nvfuser_cuda_float32 - AssertionError: assert 235985920 == 242277376 + where 242277376 = sum(dict_values([6291456, 3072, 9216, 3072, 3072, 12288,...
getitem_nvfuser tests failed for a similar reason to https://github.com/Lightning-AI/lightning-thunder/blob/54bb6146ff757905925f8d9ea2197870c4971011/thunder/tests/opinfos.py#L3113-L3115. I can again create a wrapper so `slice` objects don't get passed to FusionDefinitionWrapper. But I'd love to hear thoughts from...
`FAILED thunder/tests/test_examine_memory.py::test_view_ops_nvfuser_cuda_float32 - assert 128 == 144` is due to golden testing. 128 is less memory than 144, so it's in fact an improvement.
`FAILED thunder/tests/test_jit_general.py::test_litgpt_variants[cuda-falcon-40b-like] - RuntimeError: inp->definition() && inp->definition()->isA() INTERNAL ASSERT FAILED at "/Fuser/csrc/preseg_passes/remove_empty.cpp":256, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. Inputs to CatOp must be outputs of...
Filed another blocker: https://github.com/Lightning-AI/lightning-thunder/issues/549
Yet-another blocker: https://github.com/NVIDIA/Fuser/issues/2362
These are all blockers that I can tell from [the recent CI run](https://dev.azure.com/Lightning-AI/lightning/_build/results?buildId=204827&view=logs&j=2840892e-91ab-5245-da62-77ec9923516a&t=444f4171-6797-5730-4229-41427ed3bdc9). 🤞
The previous blockers have all been fixed. However, the [most recent CI run](https://dev.azure.com/Lightning-AI/lightning/_build/results?buildId=204975&view=logs&j=2840892e-91ab-5245-da62-77ec9923516a&t=444f4171-6797-5730-4229-41427ed3bdc9&l=11846) failed with new errors -- number mismatches this time... ``` =========================== short test summary info ============================ FAILED...
Good news: these number mismatches no longer show up after I resync. Bad news: distributed tests [start to fail](https://dev.azure.com/Lightning-AI/lightning/_build/results?buildId=205296&view=logs&j=b97dbf6d-98bd-5b68-7c01-878b39c3da28&t=3c72ede2-92c1-5cd2-2bac-ad2411af2aea). One error is https://github.com/NVIDIA/Fuser/issues/2395. The other error seems to be that...