Sarthak Bhatt
Results
1
issues of
Sarthak Bhatt
The training time for the model is very slow - 400 hours per epoch for 3B parameter as compared to 1.5 hours per epoch for my Keras data-parallel implementation of...