feat: Add Support for qwen3_moe Architecture, e.g. Qwen 3 Coder 30B

Open ireicht opened this issue 3 months ago • 0 comments

This commit introduces support for the Qwen 3 Coder 30B model and includes a critical fix for handling bfloat16 data types in the MLX inference engine.

Changes made:

Added Qwen 3 Coder 30B model configuration to models.py with proper repository mapping
Added pretty name for Qwen 3 Coder 30B in the model naming system
Fixed a RuntimeError related to bfloat16 data conversion in sharded_inference_engine.py by implementing a dedicated numpy conversion function that properly handles dtype casting
Updated MLX dependencies to use version ranges instead of fixed versions for better compatibility

The bfloat16 fix addresses a specific issue where item size mismatches were causing RuntimeError during inference with the Qwen3-Coder extension. The fix ensures proper dtype handling while maintaining performance through zero-copy operations where possible.

Nov 26 '25 11:11 ireicht