InternVL [Feature] 请问如何在internvl模型代码上实现mixture-of- experts版本呢

Motivation

我尝试加入MOE，但是在hidden_states经过F.Linear(embed_dim, num_of_experts)这个线性层后就变成了nan，假设num_of_experts为4，请问这是怎么回事呢，菜菜希望得到解答，万分感谢！

Related resources

No response

Additional context

No response

Sep 04 '24 13:09 zweiqi

您好，关于MoE您可以看看fastmoe或者deepspeed-moe的实现，我们暂时还没有计划在这个代码中加上MoE的支持。

Sep 06 '24 07:09 czczup

Motivation

我尝试加入MOE，但是在hidden_states经过F.Linear(embed_dim, num_of_experts)这个线性层后就变成了nan，假设num_of_experts为4，请问这是怎么回事呢，菜菜希望得到解答，万分感谢！

Related resources

No response

Additional context

No response

你好，请问您解决这个问题了吗

Oct 24 '24 12:10 afasijbfk

Hi, since there hasn't been any recent activity on this issue, I'll be closing it for now. If it's still an active concern, don't hesitate to reopen it. Thanks for your understanding!

Dec 09 '24 11:12 czczup

Motivation

我尝试加入MOE，但是在hidden_states经过F.Linear(embed_dim, num_of_experts)这个线性层后就变成了nan，假设num_of_experts为4，请问这是怎么回事呢，菜菜希望得到解答，万分感谢！

Related resources

No response

Additional context

No response

hello！我也遇到同样的问题，请问您解决了吗

Dec 26 '24 10:12 josephzpng