Fast-LLM
Fast-LLM copied to clipboard
Changes for basic LLaDA style diffusion masking support
✨ Description
Cleaned up the code a bit:
- Added Diffusion config object as we discussed
- removed noise schedules for v1
- Moved loss calculation to head.py (as I noticed language modelling loss is computed there)
- Moved bidirectional attention to preprocessing.py file as it seems like the attention mask is computed there
Of course still a WIP but feel free to leave comments and suggestions
These are changes to address this PR: https://github.com/ServiceNow/Fast-LLM/issues/208#issue-2950083282