Thomas Wang
Results
28
issues of
Thomas Wang
This PR is for sorting out the tr10-104B config.
A very useful tool in order to understand model performance beyond obtaining loss: Actually show what are the predictions. It'd be very useful to be able to "see" the output...
Good First Issue
requires: https://github.com/microsoft/DeepSpeed/pull/2035 TODO: - [x] Make sure we can run shared enc/dec with MLM - [x] Add test making sure that it runs. with MLM - [ ] Make sure...