Sarthak Bhatt

Results 1 issues of Sarthak Bhatt

The training time for the model is very slow - 400 hours per epoch for 3B parameter as compared to 1.5 hours per epoch for my Keras data-parallel implementation of...