[Feature] 建议训练internlm2-chat-7b 的 GPTQ-4bit 量化模型并支持llmdeploy部署
Motivation
llmdeploy 支持在V100 显卡上部署 GPTQ量化模型
Related resources
Qwen/Qwen1.5-72B-Chat-GPTQ-Int4 TheBloke/Llama-2-7B-Chat-GPTQ 因为llama qwen都提供了GPTQ模型
Additional context
No response
我们内部先讨论下。 从我们目前的规划看,4月份暂排不上这个feature的开发。
Awesome project. What's the challenge in implementing this feature? Since the project support the Turing architecture, Turing and Volta architectures seem to be similar in code implementation.
我们内部先讨论下。 从我们目前的规划看,4月份暂排不上这个feature的开发。
First of all, we haven't supported GPTQ yet. Secondly, there are some high-priority features and we are short of hands
Please try https://github.com/InternLM/lmdeploy/releases/tag/v0.6.0a0. It supports GTPQ.