GPTQModel issues

[BUG]Qwen3-Omni-30B-A3B saved quantized model can't be loaded by vllm

3

**Describe the bug** 1. ValueError: There is no module or parameter named 'audio_tower.positional_embedding.positional_embedding' in Qwen3OmniMoeThinkerForConditionalGeneration (EngineCore_DP0 pid=18809) Process EngineCore_DP0: I can solve this by remove audio_tower.positional_embedding.positional_embedding in the safetensors. 2....

allerou4

bug

[QUESTION] Qwen3 Omni VRAM memory leak

16

The model is around 70 GiB. I tried running GPTQModel on a RTX PRO 6000 with 96GiB vram but still ran out of memory. Config `QuantizeConfig(bits=4, group_size=128)`.

tommyip

Quantized model behave differently on multi GPU with device_map="auto" compared to single GPU

6

I quantized llma2-70b into Int8 format using [vllm expample](https://docs.vllm.com.cn/en/latest/features/quantization/gptqmodel.html). But I found that if I load the model with `device_map="auto"` on 2 GPUs, the output attention hidden states of second...

DarkenStar

[BUG] AWQ quantization fail for GLM-4.5-Air

11

1. Looks like AWQ does not honor the `layer_modules_strict=False` when certain modules are not placed on every layer. Stacktrace: ``` File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/module_looper.py", line 1156, in loop return self._loop_impl(fail_safe=fail_safe, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...

avtc

bug

Quantization both talker and thinker failed for QWEN2.5-OMNI

3

For OMNI-2.5, I separately quantized the "thinker" and the "talker" (the entire model). But when I merge them (thinkermodel.talker = talker), inference fails reporting that the "thinker" is affected and...

LiMa-cas

[BUG] Quant Qwen3-Next-80B-A3B-Instruction takes a long time

4

**Describe the bug** Quant Qwen3-Next-80B-A3B-Instruction takes a long time Quantification requires more than 1 day of time，I only used one GPU， 1 Should this 80B model adopt multi GPU quantization？How...

xiaotianns

bug

Awq bf16

Qubitium

[BUG] Vram on cuda:0 usage vs 4.2.5

40

Trying to quantize with gptqmodel commit hash d8f3c78988bb8f11982a5e52361537ffba05d145 with `mock_quantization=False`, and got an error on first layer with experts (layer 1) (GLM-4.5-Air): ``` Quantizing mlp.experts.32.gate_proj in layer [1 of 45]...

avtc

bug

Moe-vram for QQQ

`balanced` mode does not work with GLM-4.5-Air and QQQ method, tried on 4 x 3090: ``` File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/module_looper.py", line 1156, in loop return self._loop_impl(fail_safe=fail_safe, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120,...

avtc

[MODEL] Support `OpenVLA`

7

Hi, I would like to try GPTQ on [OpenVLA](https://huggingface.co/openvla/openvla-7b). I found that most of the examples are using purely language input (i.e., https://github.com/ModelCloud/GPTQModel/blob/main/examples/quantization/transformers_usage.py). Do we support model with visual and...

billamiable

GPTQModel
GPTQModel copied to clipboard

Metadata

[BUG]Qwen3-Omni-30B-A3B saved quantized model can't be loaded by vllm

[QUESTION] Qwen3 Omni VRAM memory leak

Quantized model behave differently on multi GPU with device_map="auto" compared to single GPU

[BUG] AWQ quantization fail for GLM-4.5-Air

Quantization both talker and thinker failed for QWEN2.5-OMNI

[BUG] Quant Qwen3-Next-80B-A3B-Instruction takes a long time

Awq bf16

[BUG] Vram on cuda:0 usage vs 4.2.5

Moe-vram for QQQ

[MODEL] Support `OpenVLA`

← Metadata

Owner

Metadata

GPTQModel GPTQModel copied to clipboard

Metadata

← Metadata

Owner

Metadata

GPTQModel
GPTQModel copied to clipboard