Deyu Fu
Deyu Fu
Current serialize/deserialize code in init is not neccessay
Currently if more input is supplied, only leading ones will be used and rest will be silently ignored. Need better check and message
# What does this PR do ? fix a failed functional test due to main branch api change Also when merging https://github.com/NVIDIA/Megatron-LM/pull/2395, commits are squashed and thus main/dev are no...
# What does this PR do ? Add muon and layerwise distributed optimizer to maim ## Contribution process ```mermaid flowchart LR A[Pre-checks] --> B[PR Tests] subgraph Code Review/Approval C1[Expert Review]...