Jinxu Zhang

Results 2 issues of Jinxu Zhang

Why is the output of the intermediate verification empty after training?

### Motivation 我尝试加入MOE,但是在hidden_states经过F.Linear(embed_dim, num_of_experts)这个线性层后就变成了nan,假设num_of_experts为4,请问这是怎么回事呢,菜菜希望得到解答,万分感谢! ### Related resources _No response_ ### Additional context _No response_