Haojun Zhao
Haojun Zhao
Tests for Nanotron can be conducted in both full and lite modes. Tasks can be defined by modifying the configuration file
Dear LitGPT Maintainer, Thank you for your great work. I encountered an issue while trying to fine-tune LLaMA 3.1 and came here for reference. I was looking for the LLaMA...
**in switch aux loss, why don't we need to do all gather for 'probs'?** Here is the code, https://github.com/NVIDIA/Megatron-LM/blob/d86da876524377e351d82f506dd65f10fd5a3af1/megatron/core/transformer/moe/moe_utils.py#L30-L128 Relevant question: https://github.com/NVIDIA/Megatron-LM/issues/1406