Li Tao
Li Tao
# Accelerate Training for SwinTransformer Utilize Nvidia/apex fused ops and our fused ops, based on commit id `cbaa0d8`. ## Acceleration strategies - apex O2 - fused_adam (`apex` fused ops, to...
In the notebook, the variable `score` is not defined in the code. Need to use `score = model.evaluate(x_test, y_test)` to obtain the score first.
# What does this PR do ? Support packed sequence in MTP module. PR for Main branch: [PR2173](https://github.com/NVIDIA/Megatron-LM/pull/2173) :warning: For major changes (either in lines of code or in its...
# What does this PR do ? Support placing MTP to a standalone stage, e.g., MTP in the 2nd last VPP stage for better VPP balance. This is the PR...
# What does this PR do ? In previous implementation, reduce across DP is added for the correct aux_loss logging. However, current reduction will cause redundant communication - For `seq_load_balancing_loss`...
# What does this PR do ? - Previous code: when params are bf16, use `.float() `to convert, which cause additional memory usages here; - After this: directly use `main_param`...