Shailja
Shailja
Hello Geunsik, Thank you for your email I will be happy to help. Can you share your my-codegen-350m-deepspeed- finetune.sh, ds_config.json, and the size of the training data, so I get...
Yes, I already had transformers 0.29.dev version but was conflicting with a lower version Works now On Tue, May 9, 2023, 8:23 a.m. ArmelRandy ***@***.***> wrote: > The problem probably...
I tried commenting out the valid loss, and the training finished, however when saving the checkpoint, OOM was raised, evaluating Epoch: 100%|██████████| 1/1 [00:12