SimonBenhamou
SimonBenhamou
With either @shaunheilee and @sayhellotoAI2 proposed solutions, I get NaN loss when resuming a training.... Am I the only one ?
@minhthuc2502 I want the distribution over the whole vocabulary (or at least top K). With your solution I only get the log prob of the token generated by the model......
Hello, Is there an update on this? Thanks, Simon
I did, and could reproduce the fact that - no matter how long the prefix, the generation time is the same - when using the generate_token method and measuring the...