Iz Beltagy
Iz Beltagy
Comments
Thank you @rafaelbailo for converting Mike Morrison's design into latex. I used it to design my poster (presented a few days ago) and it worked well. It took like a...
## 🚀 Feature Add a lowering for `unfold`. ## Motivation I want to run Longformer ([model code on HF repo](https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_longformer.py)) on pytroch-xla, and this requires an overlapping sliding window operation...
Slowing the increase of the learning rate improved the results slightly but more importantly, reduced variance between results.
to make it easy for new users to use our setup
We now have a script that convert megatron-deepspeed checkpoints to HF-transformers checkpoints. Project is [here](https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/main/tools/convert_checkpoint) and the script is [here](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/tools/convert_checkpoint/deepspeed_to_transformers.py). However, the script doesn't have unit tests that confirm that...
Follow appendix A.1 https://arxiv.org/pdf/1812.06162.pdf to implement monitoring of gradient noise scale and add it to the tensorboard log.
### 🚀 The feature, motivation and pitch _No response_ ### Alternatives _No response_ ### Additional context _No response_