efg001

Results 3 comments of efg001

Edit: Nvm assuming you have max_new_tokens = 500 n_embd = 768 The CPU inference speedup is significant because max_new_tokens < n_embd. Previous comment: sorry for digging out this old issue...

Thanks for responding : ) 1. I agree that because the callback is not a coroutine function, if you dont start a new thread here, the streaming client can/could block(i.e...

I had the same idea. It's a simple change: do you want to give a try? I am seeing the model overfitting training dataset way earlier after teacher forcing is...