Matthias Uhlig comments

Results 8 comments of


                                            Matthias Uhlig

Training on LORA using multi-gpu is giving constant loss

could you please provide the code used for training? and some infos about the gpus you used

Training on LORA using multi-gpu is giving constant loss

Hm.. I guess there is either a cuda problem if you are doing 4bit or 8bit training or something wrong with your training data or script.

Training on LORA using multi-gpu is giving constant loss

Mabey you could try with torch.autocast("cuda"): trainer.train() If this does not work you could try the script from the mixtral blogpost on huggingface https://huggingface.co/blog/mixtral#fine-tuning-with-%F0%9F%A4%97-trl

wierd conversation with zephyr-7b-dpo-lora

Try loading the sft adapter first. Then merge the adapter into the base model and than load the dpo adapter. U can use the following code: model_name = "alignment-handbook/zephyr-7b-sft-lora" tokenizer...

TypeError: quantize_dynamic() got an unexpected keyword argument 'activation_type'

Please check your versions installed. fastt5 requiers specific packages like onnxruntime=1.10.0 If you have installed a newer version the code wont work

Quantization Time

> Hello! > On 2 GPU it would take approximately 6 –10 hours, it's depends on hyperparameters. Thanks for the fast Response . I use the exact Same hyperparameters as...

Quantization Time

If you need more Details or a test after code updates i will be happy to help.

Support quantized model like awq with vllm

Cant you just load the quantized model from huggingface ?