Jatin Kumar

Results 1 comments of Jatin Kumar

I was wondering the same thing. It is useful during training to calculate the loss and adjust weights, but during prediction it seems the other token predictions are a waste.