lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

[Feature] 建议训练internlm2-chat-7b 的 GPTQ-4bit 量化模型并支持llmdeploy部署

Open wwewwt opened this issue 1 year ago • 3 comments

Motivation

llmdeploy 支持在V100 显卡上部署 GPTQ量化模型

Related resources

Qwen/Qwen1.5-72B-Chat-GPTQ-Int4 TheBloke/Llama-2-7B-Chat-GPTQ 因为llama qwen都提供了GPTQ模型

Additional context

No response

wwewwt avatar Mar 20 '24 08:03 wwewwt

我们内部先讨论下。 从我们目前的规划看,4月份暂排不上这个feature的开发。

lvhan028 avatar Mar 20 '24 09:03 lvhan028

Awesome project. What's the challenge in implementing this feature? Since the project support the Turing architecture, Turing and Volta architectures seem to be similar in code implementation.

我们内部先讨论下。 从我们目前的规划看,4月份暂排不上这个feature的开发。

ZZBoom avatar Apr 28 '24 01:04 ZZBoom

First of all, we haven't supported GPTQ yet. Secondly, there are some high-priority features and we are short of hands

lvhan028 avatar Apr 28 '24 02:04 lvhan028

Please try https://github.com/InternLM/lmdeploy/releases/tag/v0.6.0a0. It supports GTPQ.

zhyncs avatar Aug 30 '24 08:08 zhyncs