verl
verl copied to clipboard
GRPO Qwen3 megatron training script
Can Qwen3 model be provided to use megatron to train GRPO algorithm at the back end?
I don't know if you've already seen them but these examples might be helpful - https://github.com/volcengine/verl/tree/main/examples/grpo_trainer
You'll find some Qwen3 examples at the end.
Thank you, I'll study.