feat: Support qwen3-next, mcore path
What does this PR do ?
Add qwen3-next support, recipes and tests to mcore path.
Issues
https://github.com/NVIDIA-NeMo/RL/issues/1431
Usage
SFT:
uv run --extra mcore --extra vllm --extra automodel python examples/run_sft.py \
--config examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml \
GRPO:
uv run --extra mcore --extra vllm python examples/run_grpo_math.py \
--config examples/configs/recipes/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.yaml
Before your PR is "Ready for review"
Pre checks:
- [x ] Make sure you read and followed Contributor guidelines
- [x ] Did you write any new necessary tests?
- [x ] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
- [ ] Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.
Additional Information
- ...
Summary by CodeRabbit
-
New Features
- Added training recipes for Qwen3-Next-80B model with GRPO and SFT configurations.
-
Bug Fixes
- Fixed YAML syntax errors in GRPO configuration files.
-
Configuration Enhancements
- Updated distributed data parallel settings across training recipes.
- Updated dependency versions including Transformer library and new runtime dependencies.
-
Tests
- Added automated test suites for Qwen3-Next-80B model training pipelines.
❌ Submodule Fast-Forward Check Failed
Check based on commit: ae5d7ac6bc84f0c522f94fbcbcb9eb09c7e0a3d7 (PR #1530 from ahmadki/qwen3-next)
✅ Submodules that are properly updated:
Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward) Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)
❌ Submodules that need attention:
Automodel: ❌ Commits have DIVERGED from a common ancestor TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/ CURRENT (PR #1530 from ahmadki/qwen3-next): https://github.com/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33/
Please ensure all submodule commits are fast-forwards of the main branch before merging.
📝 Walkthrough
Walkthrough
This PR updates NeMo-RL to support Qwen3-Next-80B model integration with Megatron-LM, adding new GRPO and SFT test recipes and configurations, refactoring policy configuration type signatures, updating submodule pointers and dependencies, and removing some validation logic from MegatronPolicyWorker.
Changes
| Cohort / File(s) | Summary |
|---|---|
Version control configuration .gitignore, .gitmodules |
Added .env and .envrc to gitignore; updated Megatron-LM submodule URL and branch from terrykong/Megatron-LM (yuya/nemo-rl-use-dev) to ahmadki/Megatron-LM (ahmadki/rl-qwen3-from-nemo-rl-submodule) |
Submodule pointers 3rdparty/Automodel-workspace/Automodel, 3rdparty/Megatron-Bridge-workspace/Megatron-Bridge, 3rdparty/Megatron-LM-workspace/Megatron-LM |
Updated submodule commit references to latest versions |
Dependency management pyproject.toml, 3rdparty/Megatron-Bridge-workspace/setup.py |
Pinned transformers to >=4.57.1; relaxed mlflow constraint; updated megatron-core to <0.17.0; added rwkv-fla>=0.7.202508221413; adjusted ruff linting configuration |
GRPO and general policy configs examples/configs/grpo_math_1B.yaml, examples/configs/sft.yaml, examples/configs/vlm_grpo_3B.yaml, examples/configs/vlm_grpo_3B_megatron.yaml |
Added average_in_collective flag; removed KL divergence-related loss configuration; changed defer_fp32_logits to null; updated metric names from accuracy to reward |
GRPO recipe configs examples/configs/recipes/llm/grpo-deepscaler-1.5b-16K.yaml, examples/configs/recipes/llm/grpo-deepscaler-1.5b-24K.yaml, examples/configs/recipes/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.yaml |
Fixed YAML list syntax; added comprehensive Qwen3-Next-80B configuration with tensor/pipeline/expert parallelism settings, generation, logging, and cluster topology |
SFT recipe config examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml |
Added new supervised fine-tuning configuration for Qwen3-Next-80B with Megatron parallelism, batch sizing, and multi-node deployment settings |
Policy type definitions nemo_rl/models/policy/__init__.py |
Refactored TypedDict signatures: changed enabled fields from Literal[True] to bool; added average_in_collective to MegatronDDPConfig; converted many fields to NotRequired; removed Disabled config variants; added converter_type to MegatronConfig; adjusted type constraints on optimizer/scheduler fields |
Megatron policy worker nemo_rl/models/policy/megatron_policy_worker.py |
Removed assertion coupling logprob_chunk_size to defer_fp32_logits; removed per-token loss forcing and MoE sanity check; sourced average_in_collective from config; added grad_norm to training metrics; removed public method check_tensor_parallel_attributes |
Policy utilities nemo_rl/models/policy/utils.py |
Relaxed runtime assertion in sliding_window_overwrite; commented out precondition check for hf_config.sliding_window |
Test scripts and configuration tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh, tests/test_suites/llm/sft-qwen3-next-80ba3b-8n8g-megatron.sh, tests/test_suites/nightly.txt |
Added two new test shell scripts for GRPO and SFT Qwen3-Next experiments with logging, metrics validation, and TensorBoard conversion; registered tests in nightly suite |
Sequence Diagram(s)
sequenceDiagram
participant Test as Test Runner
participant Experiment as Training Experiment
participant Logging as Log Handler
participant Metrics as Metrics Validator
Test->>Test: Initialize config (NUM_NODES, MAX_STEPS)
Test->>Test: Check if max steps reached
alt Max steps not reached
Test->>Experiment: Run training (uv run)
Experiment-->>Experiment: Training loop
Experiment-->>Logging: Write tensorboard logs
Test->>Logging: Convert TB logs to JSON
Test->>Metrics: Parse JSON metrics
Metrics->>Metrics: Validate conditions<br/>(loss, reward/val_loss, timing)
alt All metrics pass
Metrics-->>Test: Success
else Metrics fail
Metrics-->>Test: Failure
end
else Max steps reached
Test->>Test: Exit early
end
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~60 minutes
Areas requiring extra attention:
-
Type signature refactoring (
nemo_rl/models/policy/__init__.py): Extensive changes to TypedDict definitions (removal of Disabled variants, NotRequired adjustments, type narrowing/widening) require careful validation of backward compatibility and config parsing logic -
MegatronPolicyWorker logic changes (
nemo_rl/models/policy/megatron_policy_worker.py): Removal of assertions (defer_fp32_logits coupling, per-token loss forcing) and public method deletion need verification that downstream code handles these changes; grad_norm metric collection logic should be reviewed for correctness - Submodule pointer updates: Verify that the new Megatron-LM and Megatron-Bridge commits are stable and compatible with the updated configurations
-
Dependency constraint changes (
pyproject.toml): Transformer pinning to 4.57.1 and mlflow relaxation should be tested against existing recipes to ensure no regressions
Possibly related PRs
-
PR
#1389: Modifies average_in_collective field in MegatronDDPConfig and changes MegatronPolicyWorker method signatures, directly overlapping with type and implementation changes in this PR -
PR
#1398: Updates Megatron integration code and config typing in the same policy modules, affecting how Megatron configurations are parsed and applied -
PR
#1334: Modifies nemo_rl/models/policy/init.py TypedDict definitions and examples/configs/sft.yaml recipes with overlapping configuration field changes
Suggested labels
CI:L1, Run CICD
Suggested reviewers
- terrykong
- guyueh1
- parthchadha
Pre-merge checks and finishing touches
❌ Failed checks (2 warnings)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
| Test Results For Major Changes | ⚠️ Warning | PR claims tests were run locally but provides no test results, metrics, or evidence. Review comments document unresolved blocker issues including invalid submodule commit hashes, runtime errors, and syntax errors, indicating code was not actually tested. | Provide actual test execution logs with final metrics for GRPO and SFT recipes. Resolve all review comment blockers: fix invalid Megatron-Bridge commit, fix KeyError/IndexError in megatron_policy_worker.py, fix test script syntax error, and document baseline metrics for new test thresholds. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The title clearly identifies the main feature being added: support for qwen3-next model with mcore (Megatron-core) path integration, which aligns with the PR's primary objective of adding qwen3-next support and recipes. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
- [ ] Commit unit tests in branch
ahmadki/qwen3-next
[!TIP]
📝 Customizable high-level summaries are now available in beta!
You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
- Provide your own instructions using the
high_level_summary_instructionssetting.- Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
- Use
high_level_summary_in_walkthroughto move the summary from the description to the walkthrough section.Example instruction:
"Divide the high-level summary into five sections:
- 📝 Description — Summarize the main change in 50–60 words, explaining why this PR is needed, why this solution was chosen, and what was done.
- 📓 References — List relevant issues, discussions, documentation, or related PRs.
- 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
- 📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |- ✔️ Additional Notes — Add any extra reviewer context. Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
❌ Submodule Fast-Forward Check Failed
Check based on commit: 8a16bb52d62c8e5556edb15359d609f48d46673c (PR #1530 from ahmadki/qwen3-next)
✅ Submodules that are properly updated:
Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward) Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)
❌ Submodules that need attention:
Automodel: ❌ Commits have DIVERGED from a common ancestor TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/ CURRENT (PR #1530 from ahmadki/qwen3-next): https://github.com/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33/
Please ensure all submodule commits are fast-forwards of the main branch before merging.
With https://github.com/terrykong/Megatron-LM/commit/0d401602bf48046683adfc2542a70613f6e772e6 and after https://github.com/NVIDIA-NeMo/RL/pull/1541 is merged, I'll rebase this PR which should reduce it to just configs and tests.
❌ Submodule Fast-Forward Check Failed
Check based on commit: 76393cdbe1e49e59f0426ff2a97006b40b04bbcb (PR #1530 from ahmadki/qwen3-next)
✅ Submodules that are properly updated:
Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
❌ Submodules that need attention:
Automodel: ❌ Commits have DIVERGED from a common ancestor TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/ CURRENT (PR #1530 from ahmadki/qwen3-next): https://github.com/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33/
Please ensure all submodule commits are fast-forwards of the main branch before merging.
@terrykong , ready for review.
The "Submodule Fast-Forward" failure is probably because the currently used automodel commit a2db048383cd54b3fafc928df4c30bf7bbf7c430 is not part of the nemo-rl-submodule branch as specified in .gitmodules.
We need the newer automodel version for dependency compatibility.
waiting for https://github.com/NVIDIA-NeMo/RL/pull/1568 before rebasing, this should truly reduce the PR to just configs and test scripts.