Lucas Wilkinson

Results 67 comments of Lucas Wilkinson

@HarryWu99 thanks for putting this PR together, Im interested in some of the metrics here too, it looks like it was approved with auto-merge enabled which means it will merge...

@HarryWu99 thanks for updating the PR, a lot the tests can be flaky, I re-ran some of them to see if its just flakiness, although Im not familiar with the...

Still broken on: ``` nvidia-cutlass==3.5.1.0 ```

> I have one little question about scale layouts - why scaleA has a stride like (1, M)? Will this layout improve the copying of scaleA in cutlass? Yes, just...

Landing to help Blackwell perf but would like to follow up on: https://github.com/vllm-project/vllm/pull/16032#discussion_r2061603794 in a future PR potentially

> Do you know any commend to test a model with num_heads = 128? And probably no TP. Not that im aware of :/ this is the smallest MLA model...

Ah I don't think it's an MLA model :/ ``` "kv_lora_rank": null, ... "use_mla": false, ```

> > other than I do think we should turn it on by default for Blackwell, Any reason not to? > > My main concern is that the CUTLASS MLA...

> > Edit: oh and ideally id still like to see accuracy numbers... > > @LucasWilkinson this `DeepSeek-V2-Lite-Chat` only has attention head number == 16 and --tp=2 is not ok....