Haojun Zhao

Results 3 issues of Haojun Zhao

Tests for Nanotron can be conducted in both full and lite modes. Tasks can be defined by modifying the configuration file

Dear LitGPT Maintainer, Thank you for your great work. I encountered an issue while trying to fine-tune LLaMA 3.1 and came here for reference. I was looking for the LLaMA...

bug
question

**in switch aux loss, why don't we need to do all gather for 'probs'?** Here is the code, https://github.com/NVIDIA/Megatron-LM/blob/d86da876524377e351d82f506dd65f10fd5a3af1/megatron/core/transformer/moe/moe_utils.py#L30-L128 Relevant question: https://github.com/NVIDIA/Megatron-LM/issues/1406