Xianyan Jia
Xianyan Jia
## 🐛 Bug LayerNorm generates incorrect output in float16 when XLA is enabled. ## To Reproduce ``` import torch import torchacc import numpy as np use_acc = 1 dtype=torch.float16 if...
I tried to reproduce the 13B rlhf training in A100-80GB * 8. I found the default training script here https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/training_scripts/single_node/run_13b.sh where the per_device_train_batch_size and per_device_mini_train_batch_size is 16, which is different...