fix: remove sft-qwen2.5-fsdp2tp8sp from nighlies
What does this PR do ?
sequence parallel + tp_size >1 is currently broken in torch==2.8.0, no point in running the test as it will raise an exception
Summary by CodeRabbit
-
Tests
- Disabled a test pending resolution of a known compatibility issue.
βοΈ Tip: You can customize this high-level summary in your review settings.
π Walkthrough
Walkthrough
A single test case for sft-qwen2.5-32b in the nightly test suite is disabled with a TODO note referencing a torch==2.8.0 compatibility issue. The actual test invocation line is removed and replaced with a commented explanation.
Changes
| Cohort / File(s) | Summary |
|---|---|
Test suite maintenance tests/test_suites/nightly.txt |
Disabled functional 32-bit test for sft-qwen2.5-32b; removed test invocation and added TODO comment noting torch==2.8.0 blocker |
Estimated code review effort
π― 1 (Trivial) | β±οΈ ~2 minutes
- Single-file change with minimal scope
- Simple documentation/disabling of a test case
- Clear explanation provided via TODO comment
Possibly related PRs
- NVIDIA-NeMo/RL#1334: Overlapping edits to the same nightly test-suite file affecting the same 32B test entries.
Suggested labels
CI
Suggested reviewers
- parthchadha
- guyueh1
- yfw
Pre-merge checks and finishing touches
β Passed checks (4 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | β Passed | Check skipped - CodeRabbitβs high-level summary is enabled. |
| Title check | β Passed | The title 'fix: remove sft-qwen2.5-fsdp2tp8sp from nighlies' accurately describes the main change: disabling/removing a nightly test due to a torch 2.8.0 compatibility issue. |
| Docstring Coverage | β Passed | No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check. |
| Test Results For Major Changes | β Passed | PR removes a broken test in nightly suite due to known torch==2.8.0 issue. Minor maintenance change, not a major code change requiring test results documentation. |
β¨ Finishing touches
π§ͺ Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
- [ ] Commit unit tests in branch
ahmadki/disable_llm_sft_qwen2_5_32b_4n8g_fsdp2tp8sp_actckpt_v3
π Recent review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
π₯ Commits
Reviewing files that changed from the base of the PR and between 1c371a9b34a703ac66fdc2d80b536f5fad99e616 and 3e8526c433973a02e06a8786f5fe2078eb02f41a.
π Files selected for processing (1)
-
tests/test_suites/nightly.txt(1 hunks)
π§° Additional context used
π§ Learnings (1)
π Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.
Applied to files:
-
tests/test_suites/nightly.txt
β° Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: sphinx-build / Build docs
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
π Additional comments (1)
tests/test_suites/nightly.txt (1)
68-69: Clear documentation and correct pattern for test disablement.The change correctly disables the sft-qwen2.5-32b test with a well-formatted TODO comment that explains the root cause and references the tracking issue. This follows the established pattern in the file for disabled tests (see lines 26β27, 48β49, 92β94).
The GitHub issue #652 exists and is open with the title "Qwen parallelizer with sequence parallelism," confirming the reference is valid and accessible.
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.