Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

Create AWS image with our tools and codebase

Open ibeltagy opened this issue 4 years ago • 9 comments

to make it easy for new users to use our setup

ibeltagy avatar Sep 29 '21 18:09 ibeltagy

Could you be a bit more specific, Iz? To run Meg-DS training?

I have the more or less ready AWS image I created for the CI - but I'm definitely not a AWS expert.

Perhaps it'd be better to create a docker image that already includes everything including torch and cuda, etc.?

stas00 avatar Sep 29 '21 18:09 stas00

Yes, to run Meg-DS training. Basically doing the steps listed in readme here https://github.com/bigscience-workshop/Megatron-DeepSpeed for them so that they only need to run the pretrain_* script.

ibeltagy avatar Sep 29 '21 18:09 ibeltagy

The only problem with this pre-made image is that our components are in flax - e.g. we get fixes in the deepspeed repo, Meg-DS gets changed too and so are transformers if we start using those.

So if you look here, we dynamically install the latest repos: https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/c839a8aa30731f71b3738d56009be9668508e366/.github/workflows/main.yml#L109-L113

But otherwise the image is ready already.

I'm glad that it'll have a use as I was about to discard it, as I have to re-do it for GCP now.

stas00 avatar Sep 29 '21 19:09 stas00

@philschmid, could you please help me to make this image we made for Megatron-Deepspeed CI somehow available to the wider group? Basically anybody at BigScience. I'm not sure if we have to do something special there, I'm an AWS newbie.

Thank you.

stas00 avatar Sep 29 '21 19:09 stas00

We could create AWS AMI and make it publicly available. This could then be used on EC2 by others.

philschmid avatar Sep 30 '21 11:09 philschmid

That would be fantastic! Thank you, Philipp!

I think a few small tweaks will be needed to the last one I created. As the latter was done for CI and the Megatron-Deepspeed source code was checked out by github-actions, which won't happen here.

stas00 avatar Sep 30 '21 16:09 stas00

@jaketae can be the first user of the AMI

ibeltagy avatar Sep 30 '21 22:09 ibeltagy

@ibeltagy, is this going to be used on EC2 on user's personal account or some HF account or else?

stas00 avatar Oct 01 '21 16:10 stas00

@ibeltagy?

stas00 avatar Oct 06 '21 03:10 stas00