lmdeploy [Feature] 建议训练internlm2-chat-7b 的 GPTQ-4bit 量化模型并支持llmdeploy部署

Motivation

llmdeploy 支持在V100 显卡上部署 GPTQ量化模型

Related resources

Qwen/Qwen1.5-72B-Chat-GPTQ-Int4 TheBloke/Llama-2-7B-Chat-GPTQ 因为llama qwen都提供了GPTQ模型

Additional context

No response

Mar 20 '24 08:03 wwewwt

我们内部先讨论下。从我们目前的规划看，4月份暂排不上这个feature的开发。

Mar 20 '24 09:03 lvhan028

Awesome project. What's the challenge in implementing this feature? Since the project support the Turing architecture, Turing and Volta architectures seem to be similar in code implementation.

我们内部先讨论下。从我们目前的规划看，4月份暂排不上这个feature的开发。

Apr 28 '24 01:04 ZZBoom

First of all, we haven't supported GPTQ yet. Secondly, there are some high-priority features and we are short of hands

Apr 28 '24 02:04 lvhan028

Please try https://github.com/InternLM/lmdeploy/releases/tag/v0.6.0a0. It supports GTPQ.

Aug 30 '24 08:08 zhyncs