Fast-LLM issues

Generalize dynamic config classes

4

# ✨ Description Fix: #126 Generalize the concept of dynamic config class from the dataset mechanism to all config class. I opted for a unique global registry of all config...

jlamypoirier

Simplify cli

# ✨ Description This removes bloat and ad-hoc registries for the cli, and instead use a dynamic config class to get the exact same result in a much simpler way....

jlamypoirier

[bug] Generate test occasionally fails

# 🐞 Describe the Bug The generated tokens from Fast-LLM occasionally differ completely from the Hugging Face (HF) counterpart. HF consistently generates the same output, so the issue likely lies...

bigximik

bug

need update

Support LM-loss and knowledge distillation together

# 🎯 **Goal (What & Why)** Knowledge distillation was added in #229 , but it currently disables the standard LM loss. Enabling knowledge distillation and standard LM loss would allow...

RaymondLi0

enhancement

For visibility: add per-layer lr-scale

# ✨ Description For visibility only. There is no intention to merge this soon. Specify different lr-scales per layer. ## 🔍 Type of change Select all that apply: - [...

RaymondLi0

Changes for basic LLaDA style diffusion masking support

# ✨ Description Cleaned up the code a bit: 1) Added Diffusion config object as we discussed 2) removed noise schedules for v1 3) Moved loss calculation to head.py (as...

gopeshh

Sandbox for Implementation of generate and integration of lm_eval (evaluation harness)

7

# ✨ Description This PR draft will be split in 3 PRs ## 🔍 Type of change Select all that apply: - [ ] 🐛 **Bug fix** (non-breaking change that...

bigximik

Update to NGC PyTorch release 25.03

# ✨ Description This PR updates the Dockerfile base image from `nvcr.io/nvidia/pytorch:24.11-py3` to `nvcr.io/nvidia/pytorch:25.03-py3`. The new base image brings updated versions of CUDA, PyTorch, cuDNN, NCCL, RAPIDS, and other key...

tscholak

Add data cleaning in fast-llm prepare, concept

3

# ✨ Description part of #112 Closes # ## 🔍 Type of change Select all that apply: - [ ] 🐛 **Bug fix** (non-breaking change that addresses a specific issue)...

bigximik

[feat] Track entropy and MI of routing distribution for topk MoE

4

# ✨ Description To better detect potential routing collapse and have a better understanding about the routing distribution, we can track the average entropy and mutual information of routing probabilities....

oleksost

enhancement

Fast-LLM
Fast-LLM copied to clipboard

Metadata

Generalize dynamic config classes

Simplify cli

[bug] Generate test occasionally fails

Support LM-loss and knowledge distillation together

For visibility: add per-layer lr-scale

Changes for basic LLaDA style diffusion masking support

Sandbox for Implementation of generate and integration of lm_eval (evaluation harness)

Update to NGC PyTorch release 25.03

Add data cleaning in fast-llm prepare, concept

[feat] Track entropy and MI of routing distribution for topk MoE

← Metadata

Owner

Metadata

Fast-LLM Fast-LLM copied to clipboard

Metadata

← Metadata

Owner

Metadata

Fast-LLM
Fast-LLM copied to clipboard