[FEATURE] Quantization of the Language Model Pedestal for LLAVA Multimodal Models

Open a2382625920 opened this issue 2 years ago • 1 comments

Hi author, I appreciate your outstanding contribution to model quantization. However, after quantization using the llama2-13B-Chinese language base in llava-v1.5-13B, and reading the quantized model again using llava, I found that the visual layer of the model has disappeared, which leads to the subsequent pictures not being passed to the language model, which is causing me a headache!

Maybe AutoGPTQ has a way to quantize the language base of llava that I haven't seen, but I hope that a function can be developed to quantize the language base of llava, so that it can support and be compatible with more models in the quantization of the language base of the multimodal large model.

Previously I considered a custom class to add llava to AutoGPTQ to perform quantization operations directly, but found that it didn't work.

This is the quantized model structure of llama-13B-Chinese 7ef37d740142a737adeac13addccff3

This is the weight portion of the model after the quantization of llama-13B-Chinese, and there is a problem with the dimension of self_attn.qkv_proj.weight that prevents the model from merging with llava b62ddb7e63ff4954c3a4c0ba1009c6f

Jan 08 '24 03:01 a2382625920

Hi @a2382625920

Did you fix this issue? Now I am also want to quantize llava-v1.6-vicuna-7B model into W8A16 via GPTQ. If you available, can you give me some points?

Thank you in advance.

Aug 16 '24 06:08 caojinpei