Fast-LLM
Fast-LLM copied to clipboard
Support block-modular architecture
✨ Description
This draft PR addresses #242 by introducing a flexible, modular configuration system for hybrid model architectures.
TODOs:
- [ ] add more testing to make sure legacy behaviour is well supported
- [ ] implement weight sharing and support for layer-specific learning rates.
- [ ] make sure model serialization/conversion qworks as expected
- [ ] review and unify naming conventions (block, layer) across codebase.
- [ ] clean up & test
model:
base_model:
cross_entropy_impl: fused
blocks:
bob:
type: transformer
hidden_size: 512
mamba:
type: discrete_mamba2
state_size: 16
expansion_factor: 2
hidden_size: 512
hybrid_block_layout: ["bob", "mamba"]
num_layers: 4
🔍 Type of change
Class hierarchy in the config system:
- started moving functionality specific to
BaseBlockintoBaseBlockConfiginlayers/common - transformer and SSM layer configs inherit from
BaseBlockConfig, both holding functionality specific to their dedicated blocks (TransformerLayer, LlambaBlock)
Block-specific hyperparameters & tensor space defintion:
-
HybridBlockConfigs implemented undermodels/hybrid/configallowing block-specific hyperparameters definition - the names of the elements in the tensor space now include block suffixes; no suffixes are used in the case of non-hybrid GPT models
- still supports legacy behaviour with blocks defined using lists like [t,m2d,m] & non-hybrid GPT models
Select all that apply:
- [ ] 🐛 Bug fix (non-breaking change that addresses a specific issue)
- [x] 🚀 New feature (non-breaking change that adds functionality)
- [ ] ⚠️ Breaking change (a change that could affect existing functionality)
- [ ] 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
- [x] 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
- [ ] 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
- [ ] 📝 Documentation change (updates documentation, including new content or typo fixes)
- [ ] 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)
📝 Changes
✅ Checklist
Make sure the following tasks are completed before submitting the PR:
General
- [x] 📜 I have read and followed the contributing guidelines.
- [x] 🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
- [ ] 🎉 The functionality is complete, and I have tested the changes.
- [ ] 📝 I have updated the documentation if needed.
- [ ] ⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
- [x] 🧩 I have commented my code, especially in hard-to-understand areas.
Dependencies and Configuration
- [ ] 🐋 I have updated the Docker configuration or dependencies, if applicable.
- [ ] 🔄 I have ensured compatibility with the existing setup after dependency changes.
Testing
- [ ] 🧪 I have added or updated tests to cover my changes.
- [ ] ✔️ New and existing tests pass locally with my changes.
- [ ] 🚦 I have tested these changes on GPUs and verified training stability.
- [ ] 🏋️ I have tested the changes on realistic training workloads, if applicable.
Performance Impact
- [ ] 📊 I have run benchmarks where applicable to evaluate the performance impact.
- [ ] ✅ The benchmarks show no performance regression.
- [ ] 🚀 The benchmarks indicate a potential performance improvement.
- [ ] ⚠️ The benchmarks indicate a potential performance degradation.
- [ ] 📈 I have provided benchmark results and detailed any performance impact below, if applicable.