[PROPOSAL]: A significant improvement in user-friendliness
Proposal
The load_checkpoint method in "colossalai.utils.checkpoint" cannot scatter large model state dict parameters since the gather operations will first fill the gpu memory and cause OOM problems. The proposal is to gather parameters step by step.
Self-service
- [X] I'd be willing to do some initial work on this proposal myself.
Hi, @nemoramo , we are currently working on that. Are you interested to be part of the development?
Hi, @nemoramo , we are currently working on that. Are you interested to be part of the development?
sure
Hi, @nemoramo , we are currently working on that. Are you interested to be part of the development?
sure
You can join our development channel via https://colossalaiworkspace.slack.com/archives/C04S86SEFCP
There is developer guide for your reference https://github.com/hpcaitech/ColossalAI/wiki/2.-Developer-Guideline
We have updated a lot. Please check the latest code. This issue was closed due to inactivity. Thanks.