TransformerLayer MLP parameters are not being set during model initialization
Describe the bug
Fix is implemented here: https://github.com/NVIDIA/NeMo/pull/8845
Transformer layer MLP always uses default values for bias, activation, and normalization if model.mcore_gpt=False, model.transformer_engine=True, model.megatron_amp_O2=True.
Steps/Code to reproduce bug
Add the following lines to NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py after model initialization:
logging.warning(f"DEBUG: layernorm_mlp.activation={model.model.module.language_model.encoder.layers._modules['0'].layernorm_mlp.activation}")
logging.warning(f"DEBUG: layernorm_mlp.use_bias={model.model.module.language_model.encoder.layers._modules['0'].layernorm_mlp.use_bias}")
logging.warning(f"DEBUG: layernorm_mlp.normalization={model.model.module.language_model.encoder.layers._modules['0'].layernorm_mlp.normalization}")
logging.warning(f"DEBUG: layernorm_mlp.layernorm_mlp.fc1_weight.shape={model.model.module.language_model.encoder.layers._modules['0'].layernorm_mlp.fc1_weight.shape}")
logging.warning(f"DEBUG: layernorm_mlp.layernorm_mlp.fc2_weight.shape={model.model.module.language_model.encoder.layers._modules['0'].layernorm_mlp.fc2_weight.shape}")
Run the following script with and without the changes:
#!/bin/bash
python /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py \
model.mcore_gpt=False \
model.transformer_engine=True \
trainer.precision=bf16 \
model.megatron_amp_O2=True \
model.activation=fast-swiglu \
model.bias=false \
model.normalization=rmsnorm
Expected behavior
The values for layernorm_mlp are set correctly:
[NeMo W 2024-04-08 09:30:49 megatron_gpt_pretraining:42] DEBUG: layernorm_mlp.activation=fast-swiglu
[NeMo W 2024-04-08 09:30:49 megatron_gpt_pretraining:43] DEBUG: layernorm_mlp.use_bias=False
[NeMo W 2024-04-08 09:30:49 megatron_gpt_pretraining:44] DEBUG: layernorm_mlp.normalization=RMSNorm
[NeMo W 2024-04-08 09:30:49 megatron_gpt_pretraining:45] DEBUG: layernorm_mlp.layernorm_mlp.fc1_weight.shape=torch.Size([3072, 768])
[NeMo W 2024-04-08 09:30:49 megatron_gpt_pretraining:46] DEBUG: layernorm_mlp.layernorm_mlp.fc2_weight.shape=torch.Size([768, 3072])
Environment overview (please complete the following information)
- Docker
- nvidia/nemo:24.03 + git pull
- docker run --rm -it --entrypoint /bin/bash --network=host --runtime=nvidia --shm-size=2g nvcr.io/nvidia/nemo:24.01.01.framework
Environment details
If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:
- OS version
- PyTorch version
- Python version
Additional context
Add any other context about the problem here. Example: GPU model
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.