Iz Beltagy

Results 12 issues of Iz Beltagy

Thank you @rafaelbailo for converting Mike Morrison's design into latex. I used it to design my poster (presented a few days ago) and it worked well. It took like a...

## 🚀 Feature Add a lowering for `unfold`. ## Motivation I want to run Longformer ([model code on HF repo](https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_longformer.py)) on pytroch-xla, and this requires an overlapping sliding window operation...

op lowering
nostale

Slowing the increase of the learning rate improved the results slightly but more importantly, reduced variance between results.

to make it easy for new users to use our setup

enhancement

We now have a script that convert megatron-deepspeed checkpoints to HF-transformers checkpoints. Project is [here](https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/main/tools/convert_checkpoint) and the script is [here](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/tools/convert_checkpoint/deepspeed_to_transformers.py). However, the script doesn't have unit tests that confirm that...

Good First Issue

Follow appendix A.1 https://arxiv.org/pdf/1812.06162.pdf to implement monitoring of gradient noise scale and add it to the tensorboard log.

arch&scale

### 🚀 The feature, motivation and pitch _No response_ ### Alternatives _No response_ ### Additional context _No response_

type/feature