verl icon indicating copy to clipboard operation
verl copied to clipboard

Qwen3 MOE model GRPO configs inconsistencies

Open RitvikKapila opened this issue 3 months ago • 0 comments

System Info

Hi, I was looking at the GRPO scripts for Qwen3 MOE models, particularly, examples/grpo_trainer/run_qwen3moe-30b_megatron_96gb.sh and examples/grpo_trainer/run_qwen3-235b_megatron_96gb.sh. There seem to be some inconsistencies and I wanted to flag them.

  1. The use_kl_loss flag is set to False in the 30B-A3B model, even though the README explicitly states that for GRPO, this should be set to True, which is also the case for the 235B-A22B model. Similarly flag kl_loss_coef should be 0.001.
  2. The max_response_length in the 235B-A22B model config is set to 1204 * 8 which is almost certainly wrong.
  3. In the bash script to run the 30B-A3B model, the line number 38 should be there - TEST_FILE="['$aime24_test_path']"

I can open a PR to fix this if needed. Please let me know.

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

Bug in config

Expected behavior

Bug in config

RitvikKapila avatar Nov 06 '25 02:11 RitvikKapila