Fast-LLM
Fast-LLM copied to clipboard
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
# ✨ Description Fix: #126 Generalize the concept of dynamic config class from the dataset mechanism to all config class. I opted for a unique global registry of all config...
# ✨ Description This removes bloat and ad-hoc registries for the cli, and instead use a dynamic config class to get the exact same result in a much simpler way....
# 🐞 Describe the Bug The generated tokens from Fast-LLM occasionally differ completely from the Hugging Face (HF) counterpart. HF consistently generates the same output, so the issue likely lies...
# 🎯 **Goal (What & Why)** Knowledge distillation was added in #229 , but it currently disables the standard LM loss. Enabling knowledge distillation and standard LM loss would allow...
# ✨ Description For visibility only. There is no intention to merge this soon. Specify different lr-scales per layer. ## 🔍 Type of change Select all that apply: - [...
# ✨ Description Cleaned up the code a bit: 1) Added Diffusion config object as we discussed 2) removed noise schedules for v1 3) Moved loss calculation to head.py (as...
# ✨ Description This PR draft will be split in 3 PRs ## 🔍 Type of change Select all that apply: - [ ] 🐛 **Bug fix** (non-breaking change that...
# ✨ Description This PR updates the Dockerfile base image from `nvcr.io/nvidia/pytorch:24.11-py3` to `nvcr.io/nvidia/pytorch:25.03-py3`. The new base image brings updated versions of CUDA, PyTorch, cuDNN, NCCL, RAPIDS, and other key...
# ✨ Description part of #112 Closes # ## 🔍 Type of change Select all that apply: - [ ] 🐛 **Bug fix** (non-breaking change that addresses a specific issue)...
# ✨ Description To better detect potential routing collapse and have a better understanding about the routing distribution, we can track the average entropy and mutual information of routing probabilities....