Megatron-DeepSpeed Add checks to confirm that the checkpoint conversion script works perfectly correct

We now have a script that convert megatron-deepspeed checkpoints to HF-transformers checkpoints. Project is here and the script is here. However, the script doesn't have unit tests that confirm that the conversion is correct.

The goal of this issue is to add such tests. The idea is to run the forward pass of both models (before and after conversion) with any random input, then use torch.allclose to assert that the output loss and the logits of both models perfectly match.

Sep 22 '21 21:09 ibeltagy

Here's a megatron-deepspeed checkpoint and here's the corresponding HF-transformer checkpoint. We just need to verify that these two are the same.

Sep 22 '21 21:09 ibeltagy

git clone https://huggingface.co/bigscience/gpt2-350m-en/tree/megatron-deepspeed is failing. It says repo not found. I can download the HF version without issue.

Sep 23 '21 14:09 StellaAthena

@stas00?

Sep 23 '21 15:09 ibeltagy

The correct syntax is:

git clone --single-branch --branch megatron-deepspeed https://huggingface.co/bigscience/gpt2-350m-en

reference: https://stackoverflow.com/a/1911126/9201239

or:

git clone https://huggingface.co/bigscience/gpt2-350m-en
cd gpt2-350m-en
git checkout megatron-deepspeed

the former will download only the desired branch, the latter will download all branches I think.

Sep 23 '21 17:09 stas00

@StellaAthena Have you made progress with this issue? If not, perhaps I'll take a jab at it!

@stas00 Will the unit test run with the CI? I'm wondering if/whether the test script would have to download the Megatron checkpoints manually on each run.

Sep 28 '21 06:09 jaketae

@StellaAthena Have you made progress with this issue? If not, perhaps I'll take a jab at it!

@stas00 Will the unit test run with the CI? I'm wondering if/whether the test script would have to download the Megatron checkpoints manually on each run.

I have not been able to get to this, ICLR stuff has been getting in the way. You’re welcome to take it over.

Sep 28 '21 13:09 StellaAthena

@stas00 Will the unit test run with the CI? I'm wondering if/whether the test script would have to download the Megatron checkpoints manually on each run.

The AWS-based CI is currently borked, need to start from scratch and build on GCS, so we do manual make test for now.

Re your question - there is no need to use on a huge checkpoint because both of the download and it'd be much more difficult to compare. It should be easy to create a tiny checkpoint of a few mbs on the fly and then convert it and then compare.

Let me know if you run into difficulties with that.

Sep 28 '21 15:09 stas00

Reposting from slack as this seems relevant for this:

I'm looking at transformers GPT2 code. https://huggingface.co/transformers/_modules/transformers/models/gpt2/modeling_gpt2.html#GPT2Model and it seems it is doing post layernom whereas the 13B one is trained using PreLN. Maybe this is why we're seeing poor performance in evaluation? Typically the number of params is the same, just the way we use them is different. Is there a preLN gpt in transformers?

Sep 28 '21 15:09 thomasw21