DreamGenX

Results 13 issues of DreamGenX

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports. ### Expected Behavior I have trained Yi 34B...

bug

### ⚠️ Please check that this feature request hasn't been suggested before. - [X] I searched previous [Ideas in Discussions](https://github.com/OpenAccess-AI-Collective/axolotl/discussions/categories/ideas) didn't find any similar feature requests. - [X] I searched...

enhancement

Sample packing is a technique that can significantly speed up training by wasting much less time in each batch with padding tokens. It does so by merging several examples into...

feature request
help wanted

Sample packing with correct attention mask (where the model can't attend to other examples in the batch) and ideally correct RoPE offset would be extremely beneficial. In SFT, examples tend...

(Q)DoRA, an alternative to (Q)LoRA is quickly proving to be a superior technique in terms of closing the gap between FFT and PEFT. Known existing implementations: - https://github.com/huggingface/peft -- enable...

rfc

### Your current environment ```text PyTorch version: 2.2.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS (x86_64) GCC...

bug

For whatever reason, the 60 second timeout does not seem to be enough in my case most of the time (4xH100 SXM llama 3 70B in fp8 + tp4). When...

### Model description The request is for Mistral Tokenizer V2, similar to your repo for V1 and [V3](https://huggingface.co/Xenova/mistral-tokenizer-v3) [1], but based on the V2 tokenizer data: https://github.com/mistralai/mistral-common/blob/main/src/mistral_common/data/mistral_instruct_tokenizer_240216.model.v2 This is tokenizer...

new model

### System Info - This was tested o na tp=4 4xH100 SXM setup - I tested these 2 releases: https://github.com/NVIDIA/TensorRT-LLM/pull/1763 and https://github.com/NVIDIA/TensorRT-LLM/pull/1725 ### Who can help? _No response_ ### Information...

bug
waiting for feedback

### System Info Hello, in the [latest release](https://github.com/NVIDIA/TensorRT-LLM/pull/2110) you say: > The C++ batch manager API is deprecated in favor of the C++ executor API, and it will be removed...

bug