Fast-LLM
Fast-LLM copied to clipboard
[hybrid_dev] Hybrid dev branch
✨ Description
For tracking: Hybrid-SSM dev branch
Outstanding issues
- Missing preprocessing when flash-attn is disabled for vision-encoder. -> KeyError: 'image_encoder_attention_mask' (is an issue to run the tests)
- when
vision_encoder.image_break_tokenis not set, shape-mismatch error in the Multimodal-embedding layer. - debug-layer-outputs/gradients hangs.