GPTQModel
GPTQModel copied to clipboard
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
**Describe the bug** 1. ValueError: There is no module or parameter named 'audio_tower.positional_embedding.positional_embedding' in Qwen3OmniMoeThinkerForConditionalGeneration (EngineCore_DP0 pid=18809) Process EngineCore_DP0: I can solve this by remove audio_tower.positional_embedding.positional_embedding in the safetensors. 2....
The model is around 70 GiB. I tried running GPTQModel on a RTX PRO 6000 with 96GiB vram but still ran out of memory. Config `QuantizeConfig(bits=4, group_size=128)`.
I quantized llma2-70b into Int8 format using [vllm expample](https://docs.vllm.com.cn/en/latest/features/quantization/gptqmodel.html). But I found that if I load the model with `device_map="auto"` on 2 GPUs, the output attention hidden states of second...
1. Looks like AWQ does not honor the `layer_modules_strict=False` when certain modules are not placed on every layer. Stacktrace: ``` File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/module_looper.py", line 1156, in loop return self._loop_impl(fail_safe=fail_safe, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...
For OMNI-2.5, I separately quantized the "thinker" and the "talker" (the entire model). But when I merge them (thinkermodel.talker = talker), inference fails reporting that the "thinker" is affected and...
**Describe the bug** Quant Qwen3-Next-80B-A3B-Instruction takes a long time Quantification requires more than 1 day of time,I only used one GPU, 1 Should this 80B model adopt multi GPU quantization?How...
Trying to quantize with gptqmodel commit hash d8f3c78988bb8f11982a5e52361537ffba05d145 with `mock_quantization=False`, and got an error on first layer with experts (layer 1) (GLM-4.5-Air): ``` Quantizing mlp.experts.32.gate_proj in layer [1 of 45]...
`balanced` mode does not work with GLM-4.5-Air and QQQ method, tried on 4 x 3090: ``` File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/module_looper.py", line 1156, in loop return self._loop_impl(fail_safe=fail_safe, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120,...
Hi, I would like to try GPTQ on [OpenVLA](https://huggingface.co/openvla/openvla-7b). I found that most of the examples are using purely language input (i.e., https://github.com/ModelCloud/GPTQModel/blob/main/examples/quantization/transformers_usage.py). Do we support model with visual and...