iibw
iibw
I tried to load a GPTQ version of Mixtral 8x7b and got an error, but a different one than posted here. I got: ``` config.py gptq quantization is not fully...
@casper-hansen > You need to use float16 or half for quantization. I switched it to torch.float16 in the config.json and my error changed to the one in https://github.com/vllm-project/vllm/issues/2251
I'll try doing that now
Yep! It seems like the latest vLLM has fixed this bug. Both GPTQ and AWQ are working for me now. Thanks for the help :)
ExLlamaV2 has taken over ExLlama in quantization performance for most cases. I hope we can get it implemented in vLLM because it is also an incredible quantization technique. Benchmarks between...
I'm also having this issue after a fresh quantization of Mixtral 8x7b instruct. There is no issue when running directly with AutoAWQ across multiple GPUs. Only when using vLLM across...
I was able to get both working tp=4 with GPTQ and AWQ. It took a long time to load the model in my case, but eventually, it loaded and then...
I used my own AWQ quantization. Try quantizing it yourself and maybe that will fix the problem.
First 7 consolidated.pth of 70B-chat downloaded perfectly. 8th failed with 403. Can't download any models now. This was the first download with my URL. Requesting a second download URL. Let's...
This error seems to have happened because c4 was updated with some `datasets` configuration options which aren't supported in older versions of `datasets`. To fix, upgrade `datasets` with `pip install...