noanti comments

Repositories
Issues
Comments

Results 4 comments of


                                            noanti

Does each token requires KNN search during inference?

@CStanKonrad Is there a practical example that using external Memory?

Does each token requires KNN search during inference?

Got it, thanks!

[Badcase]: Model inference Qwen2.5-32B-Instruct-GPTQ-Int4 appears as garbled text !!!!!!!!!!!!!!!!!!

遇到了相同的问题 vllm==0.6.1.post2 卡是v100*2。相同环境部署qwen2.5-72b-gptq-int4和qwen2.5-14b-gptq-int4都没有问题，只有32b不行，只会输出感叹号。

[Badcase]: Model inference Qwen2.5-32B-Instruct-GPTQ-Int4 appears as garbled text !!!!!!!!!!!!!!!!!!

> @noanti see this comment: [#945 (comment)](https://github.com/QwenLM/Qwen2.5/issues/945#issuecomment-2375942947) 试了vllm0.6.2和0.6.3，仍然有问题。按照前面说的，把prompt增加到50token以上就能正常输出了，很奇怪……