Changjiang GOU

Results 14 comments of Changjiang GOU

you may solve this problem if you use a higher version of gcc. I solved the same sort of problem using gcc 7.3.0.

Hi. Are you using an example code? or you are trying to run your own project. If it's the later case, could you please provide more code, such as launch...

As far as I know, this script is for running Gemini on a node, that's what '--standalone' means. Did you modify it to adapt to multi-node?

Thanks for the kind reply. Instead of using dummy data, I am using the real data provided by the link given. To double check it, the 'else' clause block in...

Hi @Seventeen17 thank you for adding this PR. I managed to install the torchdistx with your PR. However, I failed to initialize the 65B model with FSDP. Did you manage...

@Seventeen17 I managed to train the large model. Thanks again for your contribution.

> Although this is a feature which I'm looking for as well, conisdering the embedding lookup backend is FBGEMM which combines optimizer update with backward at each single step, I...

Thank you @JacoCheung . That's quite a lot of work.

by the way, I don't think it's caused by GPU memory shortage.