Feature Request: Qwen3-Omni-30B-A3B support
Prerequisites
- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the README.md.
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Qwen has released three 30b a3b omni models: https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Thinking https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Captioner
Motivation
new SOTA omni models
Possible Implementation
https://github.com/huggingface/transformers/pull/41025 https://github.com/huggingface/transformers/pull/41045
It seems to have been forgotten by the developers.
It seems to have been forgotten by the developers.
Not forgotten 🙂 audio is just more complex VL is prioritized first, so audio support will likely follow later.
facing this issue ValueError: Unrecognized configuration class <class 'transformers.models.qwen3_omni_moe.configuration_qwen3_omni_moe.Qwen3OmniMoeConfig'> for this kind of AutoModel: AutoModelForCausalLM. Any help please
@HaithemH Qwen3-vl and Qwen3-Omni hasn't been supported by llama.cpp yet
It seems to have been forgotten by the developers.
Not forgotten 🙂 audio is just more complex VL is prioritized first, so audio support will likely follow later.
Could you tell us the plan of supporting qwen3 omni?
same request
hope it
Yeah it seems like more and more omni modal models are getting released, it would be amazing if we had support for those in llama.cpp, though I know thats very complicated /:
Qwen3-VL has been supported now!
How about the plan of support Omni?
I'm curious about which one is more difficult to implement: Qwen3-Omni or Qwen3-Next?
hope this hasn’t been forgotten
We have mlx implementation now. https://github.com/Blaizzy/mlx-vlm/pull/598 Audio generation speed is acceptable as long as the input does not include images.
demo repo: https://github.com/hellopahe/joi
looking forward to audio generation