✨ Description

This pr improves some minor things in SSM/Hybrid classes, adds functionality for loading and exporting Apriel SSM and hybrid SSM models (adds corresponding modeling.py classes), adds embeddings_lr_scale argument

🔍 Type of change

Select all that apply:

[x] 🐛 Bug fix (non-breaking change that addresses a specific issue)
[x] 🚀 New feature (non-breaking change that adds functionality)
[ ] ⚠️ Breaking change (a change that could affect existing functionality)
[ ] 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
[ ] 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
[ ] 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
[ ] 📝 Documentation change (updates documentation, including new content or typo fixes)
[ ] 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

List the key changes introduced in this PR:

Add mdoeling.py classes for Apriel SSM and hybrid
Import & Export of Apriel SSM and hybrid models
Added embeddings_lr_scale
Added output_lr_scale
Debug parsing of lr_schedule when its provided as a string

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

[ ] 📜 I have read and followed the contributing guidelines.
[ ] 🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
[ ] 🎉 The functionality is complete, and I have tested the changes.
[ ] 📝 I have updated the documentation if needed.
[ ] ⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
[ ] 🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

[ ] 🐋 I have updated the Docker configuration or dependencies, if applicable.
[ ] 🔄 I have ensured compatibility with the existing setup after dependency changes.

Testing

[x] 🧪 I have added or updated tests to cover my changes.
[ ] ✔️ New and existing tests pass locally with my changes.
[ ] 🚦 I have tested these changes on GPUs and verified training stability.
[ ] 🏋️ I have tested the changes on realistic training workloads, if applicable.

Performance Impact

[ ] 📊 I have run benchmarks where applicable to evaluate the performance impact.
[ ] ✅ The benchmarks show no performance regression.
[ ] 🚀 The benchmarks indicate a potential performance improvement.
[ ] ⚠️ The benchmarks indicate a potential performance degradation.
[ ] 📈 I have provided benchmark results and detailed any performance impact below, if applicable.

📊 Performance Impact Details

🗒️ Additional Notes

May 09 '25 13:05 oleksost

@jlamypoirier can I merge this one? I effects many files, but its mostly only SSM related changes + minor things related to lr schedule being passed as a string

May 20 '25 12:05 oleksost

@jlamypoirier apologies for delayed reply, yes, it should be ready. Just need to run local tests and verify everything is ok, will merge after.

Jun 11 '25 20:06 oleksost

Apriel SSM/Hybrid

✨ Description

🔍 Type of change

📝 Changes

✅ Checklist

General

Dependencies and Configuration

Testing

Performance Impact

📊 Performance Impact Details

🗒️ Additional Notes