BlackBearBiscuit

Results 6 comments of BlackBearBiscuit

解决了吗?老铁

有人可以分享一下中文数据集吗?万分感谢!

> Hi @ranggihwang -- thank you for your interest in DeepSpeed and ZeRO-3. > > We do have some rationale for why we only support ZeRO-Stage-2 and MoE together. I...

+1, please upload it to modelscope!

> OK, I'll use this commit to test it again. How about the performance? When I pretrain deepseek-v2 in H100-80G, I met the same(FA3 is slower than FA2)