Maxim Bobrin

Results 5 comments of Maxim Bobrin

Same error. Did you managed to fix this?

Same problem here, on PEFT == 0.9 loss is NAN, while on peft

> @skylooop Without the full code and data, we cannot start debugging this issue. If you cannot share, can you identify the exact version of PEFT at which training starts...

I observed same problem with DPOTrainer. `generate_during_eval=True` in DPOConfig produces reference outputs from current model being trained.

Hi! Have you managed to solve this problem? During generate_during_eval=True, ref_model outputs are almost the same as current policy model trained. So it seems like ref_model is being trained along.