RL icon indicating copy to clipboard operation
RL copied to clipboard

feat: Support qwen3-next, mcore path

Open ahmadki opened this issue 5 months ago • 7 comments

What does this PR do ?

Add qwen3-next support, recipes and tests to mcore path.

Issues

https://github.com/NVIDIA-NeMo/RL/issues/1431

Usage

SFT:

uv run --extra mcore --extra vllm --extra automodel python examples/run_sft.py \
    --config examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml \

GRPO:

uv run --extra mcore --extra vllm python examples/run_grpo_math.py \
    --config examples/configs/recipes/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.yaml

Before your PR is "Ready for review"

Pre checks:

  • [x ] Make sure you read and followed Contributor guidelines
  • [x ] Did you write any new necessary tests?
  • [x ] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • [ ] Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features

    • Added training recipes for Qwen3-Next-80B model with GRPO and SFT configurations.
  • Bug Fixes

    • Fixed YAML syntax errors in GRPO configuration files.
  • Configuration Enhancements

    • Updated distributed data parallel settings across training recipes.
    • Updated dependency versions including Transformer library and new runtime dependencies.
  • Tests

    • Added automated test suites for Qwen3-Next-80B model training pipelines.

ahmadki avatar Nov 17 '25 18:11 ahmadki

❌ Submodule Fast-Forward Check Failed

Check based on commit: ae5d7ac6bc84f0c522f94fbcbcb9eb09c7e0a3d7 (PR #1530 from ahmadki/qwen3-next)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward) Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/ CURRENT (PR #1530 from ahmadki/qwen3-next): https://github.com/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions[bot] avatar Nov 17 '25 18:11 github-actions[bot]

📝 Walkthrough

Walkthrough

This PR updates NeMo-RL to support Qwen3-Next-80B model integration with Megatron-LM, adding new GRPO and SFT test recipes and configurations, refactoring policy configuration type signatures, updating submodule pointers and dependencies, and removing some validation logic from MegatronPolicyWorker.

Changes

Cohort / File(s) Summary
Version control configuration
.gitignore, .gitmodules
Added .env and .envrc to gitignore; updated Megatron-LM submodule URL and branch from terrykong/Megatron-LM (yuya/nemo-rl-use-dev) to ahmadki/Megatron-LM (ahmadki/rl-qwen3-from-nemo-rl-submodule)
Submodule pointers
3rdparty/Automodel-workspace/Automodel, 3rdparty/Megatron-Bridge-workspace/Megatron-Bridge, 3rdparty/Megatron-LM-workspace/Megatron-LM
Updated submodule commit references to latest versions
Dependency management
pyproject.toml, 3rdparty/Megatron-Bridge-workspace/setup.py
Pinned transformers to >=4.57.1; relaxed mlflow constraint; updated megatron-core to <0.17.0; added rwkv-fla>=0.7.202508221413; adjusted ruff linting configuration
GRPO and general policy configs
examples/configs/grpo_math_1B.yaml, examples/configs/sft.yaml, examples/configs/vlm_grpo_3B.yaml, examples/configs/vlm_grpo_3B_megatron.yaml
Added average_in_collective flag; removed KL divergence-related loss configuration; changed defer_fp32_logits to null; updated metric names from accuracy to reward
GRPO recipe configs
examples/configs/recipes/llm/grpo-deepscaler-1.5b-16K.yaml, examples/configs/recipes/llm/grpo-deepscaler-1.5b-24K.yaml, examples/configs/recipes/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.yaml
Fixed YAML list syntax; added comprehensive Qwen3-Next-80B configuration with tensor/pipeline/expert parallelism settings, generation, logging, and cluster topology
SFT recipe config
examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml
Added new supervised fine-tuning configuration for Qwen3-Next-80B with Megatron parallelism, batch sizing, and multi-node deployment settings
Policy type definitions
nemo_rl/models/policy/__init__.py
Refactored TypedDict signatures: changed enabled fields from Literal[True] to bool; added average_in_collective to MegatronDDPConfig; converted many fields to NotRequired; removed Disabled config variants; added converter_type to MegatronConfig; adjusted type constraints on optimizer/scheduler fields
Megatron policy worker
nemo_rl/models/policy/megatron_policy_worker.py
Removed assertion coupling logprob_chunk_size to defer_fp32_logits; removed per-token loss forcing and MoE sanity check; sourced average_in_collective from config; added grad_norm to training metrics; removed public method check_tensor_parallel_attributes
Policy utilities
nemo_rl/models/policy/utils.py
Relaxed runtime assertion in sliding_window_overwrite; commented out precondition check for hf_config.sliding_window
Test scripts and configuration
tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh, tests/test_suites/llm/sft-qwen3-next-80ba3b-8n8g-megatron.sh, tests/test_suites/nightly.txt
Added two new test shell scripts for GRPO and SFT Qwen3-Next experiments with logging, metrics validation, and TensorBoard conversion; registered tests in nightly suite

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Runner
    participant Experiment as Training Experiment
    participant Logging as Log Handler
    participant Metrics as Metrics Validator
    
    Test->>Test: Initialize config (NUM_NODES, MAX_STEPS)
    Test->>Test: Check if max steps reached
    alt Max steps not reached
        Test->>Experiment: Run training (uv run)
        Experiment-->>Experiment: Training loop
        Experiment-->>Logging: Write tensorboard logs
        Test->>Logging: Convert TB logs to JSON
        Test->>Metrics: Parse JSON metrics
        Metrics->>Metrics: Validate conditions<br/>(loss, reward/val_loss, timing)
        alt All metrics pass
            Metrics-->>Test: Success
        else Metrics fail
            Metrics-->>Test: Failure
        end
    else Max steps reached
        Test->>Test: Exit early
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • Type signature refactoring (nemo_rl/models/policy/__init__.py): Extensive changes to TypedDict definitions (removal of Disabled variants, NotRequired adjustments, type narrowing/widening) require careful validation of backward compatibility and config parsing logic
  • MegatronPolicyWorker logic changes (nemo_rl/models/policy/megatron_policy_worker.py): Removal of assertions (defer_fp32_logits coupling, per-token loss forcing) and public method deletion need verification that downstream code handles these changes; grad_norm metric collection logic should be reviewed for correctness
  • Submodule pointer updates: Verify that the new Megatron-LM and Megatron-Bridge commits are stable and compatible with the updated configurations
  • Dependency constraint changes (pyproject.toml): Transformer pinning to 4.57.1 and mlflow relaxation should be tested against existing recipes to ensure no regressions

Possibly related PRs

  • PR #1389: Modifies average_in_collective field in MegatronDDPConfig and changes MegatronPolicyWorker method signatures, directly overlapping with type and implementation changes in this PR
  • PR #1398: Updates Megatron integration code and config typing in the same policy modules, affecting how Megatron configurations are parsed and applied
  • PR #1334: Modifies nemo_rl/models/policy/init.py TypedDict definitions and examples/configs/sft.yaml recipes with overlapping configuration field changes

Suggested labels

CI:L1, Run CICD

Suggested reviewers

  • terrykong
  • guyueh1
  • parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Test Results For Major Changes ⚠️ Warning PR claims tests were run locally but provides no test results, metrics, or evidence. Review comments document unresolved blocker issues including invalid submodule commit hashes, runtime errors, and syntax errors, indicating code was not actually tested. Provide actual test execution logs with final metrics for GRPO and SFT recipes. Resolve all review comment blockers: fix invalid Megatron-Bridge commit, fix KeyError/IndexError in megatron_policy_worker.py, fix test script syntax error, and document baseline metrics for new test thresholds.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main feature being added: support for qwen3-next model with mcore (Megatron-core) path integration, which aligns with the PR's primary objective of adding qwen3-next support and recipes.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch ahmadki/qwen3-next

[!TIP]

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining why this PR is needed, why this solution was chosen, and what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions: | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context. Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 17 '25 19:11 coderabbitai[bot]

❌ Submodule Fast-Forward Check Failed

Check based on commit: 8a16bb52d62c8e5556edb15359d609f48d46673c (PR #1530 from ahmadki/qwen3-next)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward) Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/ CURRENT (PR #1530 from ahmadki/qwen3-next): https://github.com/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions[bot] avatar Nov 17 '25 19:11 github-actions[bot]

With https://github.com/terrykong/Megatron-LM/commit/0d401602bf48046683adfc2542a70613f6e772e6 and after https://github.com/NVIDIA-NeMo/RL/pull/1541 is merged, I'll rebase this PR which should reduce it to just configs and tests.

ahmadki avatar Nov 19 '25 16:11 ahmadki

❌ Submodule Fast-Forward Check Failed

Check based on commit: 76393cdbe1e49e59f0426ff2a97006b40b04bbcb (PR #1530 from ahmadki/qwen3-next)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/ CURRENT (PR #1530 from ahmadki/qwen3-next): https://github.com/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions[bot] avatar Nov 30 '25 23:11 github-actions[bot]

@terrykong , ready for review. The "Submodule Fast-Forward" failure is probably because the currently used automodel commit a2db048383cd54b3fafc928df4c30bf7bbf7c430 is not part of the nemo-rl-submodule branch as specified in .gitmodules.

We need the newer automodel version for dependency compatibility.

ahmadki avatar Dec 01 '25 16:12 ahmadki

waiting for https://github.com/NVIDIA-NeMo/RL/pull/1568 before rebasing, this should truly reduce the PR to just configs and test scripts.

ahmadki avatar Dec 01 '25 17:12 ahmadki