ZX-ModelCloud comments

Results 4 comments of


                                            ZX-ModelCloud

[QUESTION] Qwen3 Omni VRAM memory leak

The maximum text length of the calibration_dataset you're using is **40123**. This would take up too much memory. You can limit the maximum length. ``` from datasets import load_dataset from...

[BUG]Qwen3-Omni-30B-A3B saved quantized model can't be loaded by vllm

Regarding bug 1: **positional_embedding** is a non-persistent buffer, but it is being written to safetensors after calling **gptqmodel.utils.model.get_state_dict_for_save().** This error should be related to **offload_disk**. I am checking the relevant...

[BUG]Qwen3-Omni-30B-A3B saved quantized model can't be loaded by vllm

Regarding bug 2/3: It has been fixed in the main branch code of vllm. https://github.com/vllm-project/vllm/pull/29896/files#diff-a65936ff683c1b4c8d7f3cdd49c28022f38d5e7cfbee857e7dc8c4f6731af0f9R1141-R1152

[BUG]Qwen3-Omni-30B-A3B saved quantized model can't be loaded by vllm

> Regarding bug 1: > > **positional_embedding** is a non-persistent buffer, but it is being written to safetensors after calling **gptqmodel.utils.model.get_state_dict_for_save().** This error should be related to **offload_disk**. I am...