CrisRodriguez

Results 7 comments of CrisRodriguez

Hello @makseq What’s the state of this issue? do you plan on merging the proposed solution ? Thanks !

Hello @makseq, Do you have any updates on this? thanks,

> Thanks to the very smart MoE align strategy introduced in #2453, each block only uses a single expert, making it much easier to be adapted to quantized methods. This...

> @CrisRodriguez The speed difference is not limited to MoE models. Current GPTQ kernel in vLLM is mostly a GEMV kernel optimized for low batch size while the AWQ kernel...

Hi @zheng5yu9, I post this so anyone having the same doubt can easily find an answer :) # WizardCoder-Python-34B-V1.0 is based on Code-lama 34B python - It is a non-instruct...

Hi @bmartel, Thanks for your message. I got it. Meanwhile I am using the 1.4.x version that works pretty well for my less than 20-minute audios. Thanks, Cristian Cristian

Hello @makseq @bmartel, Can u please confirm that this issue has been solved in 1.7.3, and if yes, close the issue ? Thanks, Cristian