Matthias Uhlig

Results 8 comments of Matthias Uhlig

could you please provide the code used for training? and some infos about the gpus you used

Hm.. I guess there is either a cuda problem if you are doing 4bit or 8bit training or something wrong with your training data or script.

Mabey you could try with torch.autocast("cuda"): trainer.train() If this does not work you could try the script from the mixtral blogpost on huggingface https://huggingface.co/blog/mixtral#fine-tuning-with-%F0%9F%A4%97-trl

Try loading the sft adapter first. Then merge the adapter into the base model and than load the dpo adapter. U can use the following code: model_name = "alignment-handbook/zephyr-7b-sft-lora" tokenizer...

Please check your versions installed. fastt5 requiers specific packages like onnxruntime=1.10.0 If you have installed a newer version the code wont work

> Hello! > On 2 GPU it would take approximately 6 –10 hours, it's depends on hyperparameters. Thanks for the fast Response . I use the exact Same hyperparameters as...

If you need more Details or a test after code updates i will be happy to help.

Cant you just load the quantized model from huggingface ?