Maxim Bobrin
Maxim Bobrin
Same error. Did you managed to fix this?
Same problem here, on PEFT == 0.9 loss is NAN, while on peft
> @skylooop Without the full code and data, we cannot start debugging this issue. If you cannot share, can you identify the exact version of PEFT at which training starts...
I observed same problem with DPOTrainer. `generate_during_eval=True` in DPOConfig produces reference outputs from current model being trained.
Hi! Have you managed to solve this problem? During generate_during_eval=True, ref_model outputs are almost the same as current policy model trained. So it seems like ref_model is being trained along.