Li Tao issues

Results 6 issues of


                                            Li Tao

Accelerate Training/Fine-tuning for SwinTransformer

# Accelerate Training for SwinTransformer Utilize Nvidia/apex fused ops and our fused ops, based on commit id `cbaa0d8`. ## Acceleration strategies - apex O2 - fused_adam (`apex` fused ops, to...

fix a bug in tf amp notebook

In the notebook, the variable `score` is not defined in the code. Need to use `score = model.evaluate(x_test, y_test)` to obtain the score first.

[Dev] Support packed seq in MTP

# What does this PR do ? Support packed sequence in MTP module. PR for Main branch: [PR2173](https://github.com/NVIDIA/Megatron-LM/pull/2173) :warning: For major changes (either in lines of code or in its...

module: moe

Expert Review

dev branch

feat(moe): Support placing MTP layers into standalone stages

# What does this PR do ? Support placing MTP to a standalone stage, e.g., MTP in the 2nd last VPP stage for better VPP balance. This is the PR...

module: moe

Expert Review

Remove redundant reduce in aux_loss logging

# What does this PR do ? In previous implementation, reduce across DP is added for the correct aux_loss logging. However, current reduction will cause redundant communication - For `seq_load_balancing_loss`...

Expert Review

Save memory using main_param for moe in param_l2_norm

# What does this PR do ? - Previous code: when params are bf16, use `.float() `to convert, which cause additional memory usages here; - After this: directly use `main_param`...

Expert Review