caojinpei
caojinpei
Hi, zhyncs @zhyncs I found you have successed inference model quantized by AWQ. Recently, I also do the work(quantizing LLaVA model and inference it), then I have some questions for...
@zhyncs Thanks a lot for your reply. How about the throughput before and after quantization for Llama3 8B Instruct model? I am wonderring what's the bit of weight and activation...
@zhyncs Hi, zhyncs I have downloaded Meta-Llama-3.1-8B-Instruct-quantized.w8a16 model from the below link. And wanna to compare inference performance on A100 before and after quantization. https://huggingface.co/neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16/blob/main/README.md But when I inference the...
@caojinpei > @zhyncs Hi, zhyncs > > I have downloaded Meta-Llama-3.1-8B-Instruct-quantized.w8a16 model from the below link. And wanna to compare inference performance on A100 before and after quantization. https://huggingface.co/neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16/blob/main/README.md >...
Hi @a2382625920 Did you fix this issue? Now I am also want to quantize llava-v1.6-vicuna-7B model into W8A16 via GPTQ. If you available, can you give me some points? Thank...