Luka Govedič

Results 97 comments of Luka Govedič

> Besides, between #15734 and #12591, the Triton FA in ROCmBackend code path is broken (as spotted in https://github.com/vllm-project/vllm/pull/17235) Yes, these were supposed to be merged in the opposite order,...

Getting this error on Python 3.10 after this PR: ``` Traceback (most recent call last): File "/home/luka/neuralmagic-vllm/examples/offline_inference/basic/generate.py", line 3, in from vllm import LLM, EngineArgs File "/home/luka/neuralmagic-vllm/vllm/__init__.py", line 12, in...

> * For some reason I had to put [this line](https://github.com/vllm-project/vllm/blob/be22bb6f3dd7aaf8559a4a0a1beb98a37a5a8138/vllm/model_executor/layers/fused_moe/fused_marlin_moe.py#L204) within a `with torch.no_grad():`, otherwise I got an error like `RuntimeError: sum(): functions with out=... arguments don't support automatic...

@youkaichao I updated the import statement and error message. But we could also just get rid of the optional import, as vllm-flash-attn should always get included/built - let me know...

@sarckk sorry to leave this hanging for so long - but could you fix the merge conflicts so we can merge the PR?

More conflicts, could you rebase again? Sorry for the delayed review

I'm currently overhauling custom op matching in #24604. We also recently added a torch implementation of group quant, could you compare its performance with AITER? Also could you compare the...

(I asked @russellb to disable auto-merge until we get to the bottom of the performance numbers here)