noanti
noanti
@CStanKonrad Is there a practical example that using external Memory?
Got it, thanks!
[Badcase]: Model inference Qwen2.5-32B-Instruct-GPTQ-Int4 appears as garbled text !!!!!!!!!!!!!!!!!!
遇到了相同的问题 vllm==0.6.1.post2 卡是v100*2。 相同环境部署qwen2.5-72b-gptq-int4和qwen2.5-14b-gptq-int4都没有问题,只有32b不行,只会输出感叹号。
[Badcase]: Model inference Qwen2.5-32B-Instruct-GPTQ-Int4 appears as garbled text !!!!!!!!!!!!!!!!!!
> @noanti see this comment: [#945 (comment)](https://github.com/QwenLM/Qwen2.5/issues/945#issuecomment-2375942947) 试了vllm0.6.2和0.6.3,仍然有问题。按照前面说的,把prompt增加到50token以上就能正常输出了,很奇怪……