What does this PR do ?

Previous code: when params are bf16, use .float() to convert, which cause additional memory usages here;
After this: directly use main_param which is already fp32, similar as dense part so that do not need additional memory usage.

PR for dev branch: PR2234

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

[ ] I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
[ ] I have added relevant unit tests
[ ] I have added relevant functional tests
[ ] I have added proper typing to my code Typing guidelines
[ ] I have added relevant documentation
[ ] I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

(Step 1): Add PR label `Expert Review`

(Step 2): Collect the expert reviewers reviews

Attach the Expert Review label when your PR is ready for review.
GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

:warning: Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

Add Final Review label
GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either [email protected] or [email protected].

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

Nov 14 '25 01:11 BestJuly

E.g., see https://github.com/NVIDIA/Megatron-LM/blob/main/tests/unit_tests/test_utils.py#L241.

Nov 18 '25 16:11 deepakn94

E.g., see https://github.com/NVIDIA/Megatron-LM/blob/main/tests/unit_tests/test_utils.py#L241.

Thanks for the suggestions. I have added a UT for the MoE models. Since MoE module contains many sub-modules so it the UT the condition is to judge whether the two values (w/ & w/o force_create_fp32_copy) are close. I tested locally it can pass the UT but it might be skipped in the test because of @pytest.mark.flaky

 torchrun --nproc-per-node 8 -m pytest -s -v tests/unit_tests/test_utils.py -k test_param_norm
... 
============================================================================== short test summary info ==============================================================================
PASSED tests/unit_tests/test_utils.py::test_param_norm_linear[False]
PASSED tests/unit_tests/test_utils.py::test_param_norm_linear[True]
PASSED tests/unit_tests/test_utils.py::test_param_norm_moe[False]
PASSED tests/unit_tests/test_utils.py::test_param_norm_moe[True]
=================================================================== 4 passed, 18 deselected, 3 warnings in 18.49s ===================================================================

Does this make sense to you? @deepakn94

Nov 25 '25 03:11 BestJuly

Save memory using main_param for moe in param_l2_norm

What does this PR do ?

Contribution process

Pre-checks

Code review

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

(Step 3): Final Review

(Optional Step 4): Cherry-pick into release branch

Merging your PR

(Step 1): Add PR label `Expert Review`