Rick Battle

Results 3 issues of Rick Battle

While training, the amount of system RAM used scales linearly with the number of GPUs used. If training on 1 GPU takes 64GB of system RAM, then training on 3...

**Describe** Model: I use MiniLMv2 for a lot of tasks. DeBERTa can outperform both BERT and RoBERTa. Can you please distill MiniLMv2 from DeBERTa-Large? Thank you!