Patrick Fernandes
Results
10
issues of
Patrick Fernandes
Hey! I'm using a custom version of this repo to run BLOOM-175B with DeepSpeed and it works great, thank you for this! I was thinking of exploring using large models...
**Describe the bug** In the multimodal example in https://github.com/NVIDIA/Megatron-LM/tree/main/examples/multimodal , wehn we run the scripts for either pretraining or finetuning, if we unfreeze the vision encoder we get immediate NaNs...
stale