Jinxu Zhang
Results
2
issues of
Jinxu Zhang
Why is the output of the intermediate verification empty after training?
### Motivation 我尝试加入MOE,但是在hidden_states经过F.Linear(embed_dim, num_of_experts)这个线性层后就变成了nan,假设num_of_experts为4,请问这是怎么回事呢,菜菜希望得到解答,万分感谢! ### Related resources _No response_ ### Additional context _No response_