[FEATURE] Quantization of the Language Model Pedestal for LLAVA Multimodal Models
Hi author, I appreciate your outstanding contribution to model quantization. However, after quantization using the llama2-13B-Chinese language base in llava-v1.5-13B, and reading the quantized model again using llava, I found that the visual layer of the model has disappeared, which leads to the subsequent pictures not being passed to the language model, which is causing me a headache!
Maybe AutoGPTQ has a way to quantize the language base of llava that I haven't seen, but I hope that a function can be developed to quantize the language base of llava, so that it can support and be compatible with more models in the quantization of the language base of the multimodal large model.
Previously I considered a custom class to add llava to AutoGPTQ to perform quantization operations directly, but found that it didn't work.
This is the quantized model structure of llama-13B-Chinese
This is the weight portion of the model after the quantization of llama-13B-Chinese, and there is a problem with the dimension of self_attn.qkv_proj.weight that prevents the model from merging with llava
Hi @a2382625920
Did you fix this issue? Now I am also want to quantize llava-v1.6-vicuna-7B model into W8A16 via GPTQ. If you available, can you give me some points?
Thank you in advance.