ccdv-ai
ccdv-ai
Trying to run the [colab](https://colab.research.google.com/drive/19lwcRk_ZQ_ZtX-qzFP3qZBBHZNcMD1hh?usp=sharing#scrollTo=2eSvM9zX_2d3) using a small model: ```python from unsloth import FastLanguageModel import torch max_seq_length = 2048 # Gemma sadly only supports max 8192 for now dtype =...
Hi I tried to generate some text using a [mixtral instruct GGUF model](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF) but the model only predicts nonsense. Something is either wrong with the tokenizer or the chat template....
### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4...
### ⚠️ Please check that this feature request hasn't been suggested before. - [X] I searched previous [Ideas in Discussions](https://github.com/axolotl-ai-cloud/axolotl/discussions/categories/ideas) didn't find any similar feature requests. - [X] I searched...
### ⚠️ Please check that this feature request hasn't been suggested before. - [X] I searched previous [Ideas in Discussions](https://github.com/axolotl-ai-cloud/axolotl/discussions/categories/ideas) didn't find any similar feature requests. - [X] I searched...
### ⚠️ Please check that this feature request hasn't been suggested before. - [X] I searched previous [Ideas in Discussions](https://github.com/OpenAccess-AI-Collective/axolotl/discussions/categories/ideas) didn't find any similar feature requests. - [X] I searched...
### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports. ### Expected Behavior Should be able to use...
Is liger kernel supported? [Liger kernel](https://github.com/linkedin/Liger-Kernel) can increase training throughput (+20%) and significantly reduce memory usage (-60%).
What is the difference between the encoder_only script and the decoder_only script if we use `last_token` as pooling method and `m3_kd_loss` as loss? Can lora be used with encoder only?...