efg001 comments

Repositories
Issues
Comments

Results 3 comments of


                                            efg001

Cache the KV projection history when generating

Edit: Nvm assuming you have max_new_tokens = 500 n_embd = 768 The CPU inference speedup is significant because max_new_tokens < n_embd. Previous comment: sorry for digging out this old issue...

does it make sense to use thread and asyncio together and thread safety

Thanks for responding : ) 1. I agree that because the callback is not a coroutine function, if you dont start a new thread here, the streaming client can/could block(i.e...

Informer和Autoformer的解码器部分完全没有发挥应有的作用

I had the same idea. It's a simple change: do you want to give a try? I am seeing the model overfitting training dataset way earlier after teacher forcing is...