SimonBenhamou

Results 4 comments of SimonBenhamou

With either @shaunheilee and @sayhellotoAI2 proposed solutions, I get NaN loss when resuming a training.... Am I the only one ?

@minhthuc2502 I want the distribution over the whole vocabulary (or at least top K). With your solution I only get the log prob of the token generated by the model......

Hello, Is there an update on this? Thanks, Simon

I did, and could reproduce the fact that - no matter how long the prefix, the generation time is the same - when using the generate_token method and measuring the...