Patrick Fernandes

Results 10 issues of Patrick Fernandes

Hey! I'm using a custom version of this repo to run BLOOM-175B with DeepSpeed and it works great, thank you for this! I was thinking of exploring using large models...

**Describe the bug** In the multimodal example in https://github.com/NVIDIA/Megatron-LM/tree/main/examples/multimodal , wehn we run the scripts for either pretraining or finetuning, if we unfreeze the vision encoder we get immediate NaNs...

stale