Create AWS image with our tools and codebase
to make it easy for new users to use our setup
Could you be a bit more specific, Iz? To run Meg-DS training?
I have the more or less ready AWS image I created for the CI - but I'm definitely not a AWS expert.
Perhaps it'd be better to create a docker image that already includes everything including torch and cuda, etc.?
Yes, to run Meg-DS training. Basically doing the steps listed in readme here https://github.com/bigscience-workshop/Megatron-DeepSpeed for them so that they only need to run the pretrain_* script.
The only problem with this pre-made image is that our components are in flax - e.g. we get fixes in the deepspeed repo, Meg-DS gets changed too and so are transformers if we start using those.
So if you look here, we dynamically install the latest repos: https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/c839a8aa30731f71b3738d56009be9668508e366/.github/workflows/main.yml#L109-L113
But otherwise the image is ready already.
I'm glad that it'll have a use as I was about to discard it, as I have to re-do it for GCP now.
@philschmid, could you please help me to make this image we made for Megatron-Deepspeed CI somehow available to the wider group? Basically anybody at BigScience. I'm not sure if we have to do something special there, I'm an AWS newbie.
Thank you.
We could create AWS AMI and make it publicly available. This could then be used on EC2 by others.
That would be fantastic! Thank you, Philipp!
I think a few small tweaks will be needed to the last one I created. As the latter was done for CI and the Megatron-Deepspeed source code was checked out by github-actions, which won't happen here.
@jaketae can be the first user of the AMI
@ibeltagy, is this going to be used on EC2 on user's personal account or some HF account or else?
@ibeltagy?