caojinpei comments

Results 5 comments of


                                            caojinpei

[Bug] AWQ Marlin not work with Torch Compile

Hi, zhyncs @zhyncs I found you have successed inference model quantized by AWQ. Recently, I also do the work(quantizing LLaVA model and inference it), then I have some questions for...

[Bug] AWQ Marlin not work with Torch Compile

@zhyncs Thanks a lot for your reply. How about the throughput before and after quantization for Llama3 8B Instruct model? I am wonderring what's the bit of weight and activation...

[Bug] AWQ Marlin not work with Torch Compile

@zhyncs Hi, zhyncs I have downloaded Meta-Llama-3.1-8B-Instruct-quantized.w8a16 model from the below link. And wanna to compare inference performance on A100 before and after quantization. https://huggingface.co/neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16/blob/main/README.md But when I inference the...

[Bug] AWQ Marlin not work with Torch Compile

@caojinpei > @zhyncs Hi, zhyncs > > I have downloaded Meta-Llama-3.1-8B-Instruct-quantized.w8a16 model from the below link. And wanna to compare inference performance on A100 before and after quantization. https://huggingface.co/neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16/blob/main/README.md >...

[FEATURE] Quantization of the Language Model Pedestal for LLAVA Multimodal Models

Hi @a2382625920 Did you fix this issue? Now I am also want to quantize llava-v1.6-vicuna-7B model into W8A16 via GPTQ. If you available, can you give me some points? Thank...