Sekri0
Sekri0
> We are developing a complete link from pseudo-quantized models to real packing weights and directly executing WxAy quantized inference in Torch, which is expected to be released within a...
Thanks for the reply, I have one more question. In the end-to-end experiment, which kernel is used in the prefill phase of the w2a8 model
我补充了详细的环境信息。我目前装的FunASR 1.2.0似乎已经是最新的,报错问题仍然存在
从源码安装最新的funasr 1.2.1后报错消失,非常感谢
@zachzzc @raywanb Sorry to bother you guys, could you please take a look at this problem?
> Can you provide a minimum script to reproduce your problem ? @Sekri0 Sorry for not replying in time, this issue occurs midway through the inference service, so I'm not...
> > > Can you provide a minimum script to reproduce your problem ? @Sekri0 > > > > > > Sorry for not replying in time, this issue occurs...
> [@liweiqing1997](https://github.com/liweiqing1997) Totally understand. We will try to quant and fix this by next week. The bug is most likely in vLLM change model parameter names based on your stack...