Fast-LLM
Fast-LLM copied to clipboard
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
# ✨ Description Fix: #149 LoRA support * [x] Basic LoRA wrapper * [x] Basic LoRA Config * [x] Add LoRA support in attention * [x] Add LoRA support in...
# ✨ Description Fixes: #154, #155. This PR proposes a simple way to obtain layer-dependent configuration by leveraging Fast-LLM's existing config update mechanism. It works by providing a "default" layer...
# 🐞 Describe the Bug The following tests fail on main branch ``` FAILED tests/data/test_sampling.py::test_gpt_sample[full-0] - AssertionError: 3 != 6 FAILED tests/data/test_sampling.py::test_gpt_sample[full-32] - AssertionError: 1 != 6 FAILED tests/data/test_sampling.py::test_gpt_sample[full-88] -...
# ✨ Description Migrated from #248; this PR allows a dataset with prompt and completion specifically and in general any pair of text columns (eg: question and answer) to be...
# ✨ Description This PR creates a common interface for all `GPTHuggingfaceDatasetConfig` input columns via the new `source_schema` variable. Beyond the variable `filed` we require additional keys to preprocess and...
# 🎯 **Goal (What & Why)** Support chat template during dataset preparation to make it easier for SFT, DPO and other instruction finetuning methods. This takes away from the user...
# 🧐 Problem Description FP8 training can significantly improve training throughput by reducing memory requirements and improving computational efficiency. However, challenges remain in integrating FP8 across all components of the...
# 🎯 **Goal (What & Why)** Add support for training [Nemotron-H models](https://research.nvidia.com/labs/adlr/nemotronh/). Nemotron-H is a family of hybrid SSM-Transformer models (8B, 47B, 56B) trained by NVIDIA in FP8 on 20T...
# 🧐 Problem Description #104 introduced a mechanism tor selecting config classes dynamically. This can be made useful elsewhere, especially for user-made plugins. # 💡 Proposed Solution * Generalize the...
# 🎯 **Goal (What & Why)** See discussion in #211. The config validation scheme currently makes little distinction between validation, mutation, and derivation, which can make things difficult to follow....