efg001
efg001
Edit: Nvm assuming you have max_new_tokens = 500 n_embd = 768 The CPU inference speedup is significant because max_new_tokens < n_embd. Previous comment: sorry for digging out this old issue...
Thanks for responding : ) 1. I agree that because the callback is not a coroutine function, if you dont start a new thread here, the streaming client can/could block(i.e...
I had the same idea. It's a simple change: do you want to give a try? I am seeing the model overfitting training dataset way earlier after teacher forcing is...