Qi Penghui comments

Repositories
Issues
Comments

Results 13 comments of


                                            Qi Penghui

Is There Any Plan to Fix the Bias in PPO Loss?

It's about what the ppo loss are optimizing, instead of any feature or any reward shaping. So it's actually an implementation bug from mathematical derivations.

[megatron] feat: moe fp16 training

Is this config correct? I don't think it can fit in 32 GPUs ?

[megatron] feat: moe fp16 training

I'm not sure whether recent changes break the behavior you described. I observed an error: `world_size ({world_size}) is not divisible by expert_tensor_model_pipeline_parallel size (128)` Adding `actor_rollout_ref.actor.megatron.expert_tensor_parallel_size=1` can fix this error....