BlackBearBiscuit
BlackBearBiscuit
解决了吗?老铁
有人可以分享一下中文数据集吗?万分感谢!
> Hi @ranggihwang -- thank you for your interest in DeepSpeed and ZeRO-3. > > We do have some rationale for why we only support ZeRO-Stage-2 and MoE together. I...
+1 to this
+1, please upload it to modelscope!
> OK, I'll use this commit to test it again. How about the performance? When I pretrain deepseek-v2 in H100-80G, I met the same(FA3 is slower than FA2)