Zhang, Liangang
Zhang, Liangang
It seems that there is still some issue for CPU backend, i try to use this branch to run the cifar example and meet the following issue: ``` deepspeed cifar10_deepspeed.py...
Thanks for your quick reply. cifar deepspeed sample works with 2 proc after some little changes. ``` [1, 2000] loss: 1.692 [2022-09-08 14:25:57,361] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=0, lr=[0.001],...
pls help to reopen this PR, we will refine it based on latest transformers
> @liangan1 reopened :D Thanks. I will refresh this PR ASAP.
@gante I have rebased this PR and validated the functionality with llama model with both greedy&beam search, pls help to review. _update_ function should be a good start point to...
Thanks @gante I will refine it according to your comments.
> Thank you for working on this, it seems like it's growing in the right direction 💪 Apologies for the delayed review, I went back to reread the paper. >...
@gante Sorry for late reply due to holidays. The static cache is a good choice for greedy search which uses a large buffer to store the past key/value state add...
@gante just a soft reminder, can you help to review again.
> Hi @liangan1 👋 I'm holding the review until this PR is merged, as it might change the API of caches -- #29180 Thanks.