Jatin Kumar
Results
1
comments of
Jatin Kumar
I was wondering the same thing. It is useful during training to calculate the loss and adjust weights, but during prediction it seems the other token predictions are a waste.