Maxim Bobrin comments

Repositories
Issues
Comments

Results 5 comments of


                                            Maxim Bobrin

mujoco.Fatal.Error gladLoadGL in Ubuntu18.04

Same error. Did you managed to fix this?

Different versions seem to have an impact on the results

Same problem here, on PEFT == 0.9 loss is NAN, while on peft

Different versions seem to have an impact on the results

> @skylooop Without the full code and data, we cannot start debugging this issue. If you cannot share, can you identify the exact version of PEFT at which training starts...

Incorrect reference responses when using PEFT with PPOTrainer

I observed same problem with DPOTrainer. `generate_during_eval=True` in DPOConfig produces reference outputs from current model being trained.

DPOTrainer deepspeed.initialize cause ref_model not fixed

Hi! Have you managed to solve this problem? During generate_during_eval=True, ref_model outputs are almost the same as current policy model trained. So it seems like ref_model is being trained along.