[Feature] 请问如何在internvl模型代码上实现mixture-of- experts版本呢
Motivation
我尝试加入MOE,但是在hidden_states经过F.Linear(embed_dim, num_of_experts)这个线性层后就变成了nan,假设num_of_experts为4,请问这是怎么回事呢,菜菜希望得到解答,万分感谢!
Related resources
No response
Additional context
No response
您好,关于MoE您可以看看fastmoe或者deepspeed-moe的实现,我们暂时还没有计划在这个代码中加上MoE的支持。
Motivation
我尝试加入MOE,但是在hidden_states经过F.Linear(embed_dim, num_of_experts)这个线性层后就变成了nan,假设num_of_experts为4,请问这是怎么回事呢,菜菜希望得到解答,万分感谢!
Related resources
No response
Additional context
No response
你好,请问您解决这个问题了吗
Hi, since there hasn't been any recent activity on this issue, I'll be closing it for now. If it's still an active concern, don't hesitate to reopen it. Thanks for your understanding!
Motivation
我尝试加入MOE,但是在hidden_states经过F.Linear(embed_dim, num_of_experts)这个线性层后就变成了nan,假设num_of_experts为4,请问这是怎么回事呢,菜菜希望得到解答,万分感谢!
Related resources
No response
Additional context
No response
hello!我也遇到同样的问题,请问您解决了吗