Can I use tensor_parallel to inference for a GPTQ quantized model?

Open minlik opened this issue 2 years ago • 0 comments

What should I do if I want to use tensor_parallel for a GPTQ quantized model(Llama-2-7b-Chat-GPTQ for examlpe) to inference on 2 or more GPUs?

Currently, I am using AutoGPTQ to load the quantized model, and then use tp.tensor_parallel to make tensors distribute on diffenrence devices. But I am getting the following error: TypeError: cannot pickle 'module' object

Do you have any suggentions on this? Thanks.

Nov 15 '23 09:11 minlik