Austin Veselka

Results 7 comments of Austin Veselka

Good point, it is done. Also updated the fully sharded layers to match.

We battle it out. We can take then best of both or just select one if they are close. I do have a good portion of the integration with vLLM...

@jeejeelee thanks for the detailed response. I think you understand paged format correctly. So my kernels run on a paged lora format, like in S-LoRA. This of course allows for...

No worries, thanks for the code. I'll try to get back to you soon.

@jeejeelee So it looks like our kernels accomplish two partly different goals. Yours can function as a drop in replacement for the current Punica kernels. I have done some benchmarking...