CrisRodriguez
CrisRodriguez
Hello @makseq What’s the state of this issue? do you plan on merging the proposed solution ? Thanks !
Hello @makseq, Do you have any updates on this? thanks,
> Thanks to the very smart MoE align strategy introduced in #2453, each block only uses a single expert, making it much easier to be adapted to quantized methods. This...
> @CrisRodriguez The speed difference is not limited to MoE models. Current GPTQ kernel in vLLM is mostly a GEMV kernel optimized for low batch size while the AWQ kernel...
Hi @zheng5yu9, I post this so anyone having the same doubt can easily find an answer :) # WizardCoder-Python-34B-V1.0 is based on Code-lama 34B python - It is a non-instruct...
Hi @bmartel, Thanks for your message. I got it. Meanwhile I am using the 1.4.x version that works pretty well for my less than 20-minute audios. Thanks, Cristian Cristian
Hello @makseq @bmartel, Can u please confirm that this issue has been solved in 1.7.3, and if yes, close the issue ? Thanks, Cristian