ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[PROPOSAL]: A significant improvement in user-friendliness

Open nemoramo opened this issue 2 years ago • 3 comments

Proposal

The load_checkpoint method in "colossalai.utils.checkpoint" cannot scatter large model state dict parameters since the gather operations will first fill the gpu memory and cause OOM problems. The proposal is to gather parameters step by step.

Self-service

  • [X] I'd be willing to do some initial work on this proposal myself.

nemoramo avatar Mar 10 '23 06:03 nemoramo

Hi, @nemoramo , we are currently working on that. Are you interested to be part of the development?

FrankLeeeee avatar Mar 10 '23 06:03 FrankLeeeee

Hi, @nemoramo , we are currently working on that. Are you interested to be part of the development?

sure

nemoramo avatar Mar 10 '23 06:03 nemoramo

Hi, @nemoramo , we are currently working on that. Are you interested to be part of the development?

sure

You can join our development channel via https://colossalaiworkspace.slack.com/archives/C04S86SEFCP

There is developer guide for your reference https://github.com/hpcaitech/ColossalAI/wiki/2.-Developer-Guideline

FrankLeeeee avatar Mar 10 '23 06:03 FrankLeeeee

We have updated a lot. Please check the latest code. This issue was closed due to inactivity. Thanks.

binmakeswell avatar Apr 28 '23 07:04 binmakeswell