Thomas Wang

Results 28 issues of Thomas Wang

This PR is for sorting out the tr10-104B config.

A very useful tool in order to understand model performance beyond obtaining loss: Actually show what are the predictions. It'd be very useful to be able to "see" the output...

Good First Issue

requires: https://github.com/microsoft/DeepSpeed/pull/2035 TODO: - [x] Make sure we can run shared enc/dec with MLM - [x] Add test making sure that it runs. with MLM - [ ] Make sure...