Zhang, Liangang

Results 40 comments of Zhang, Liangang

It seems that there is still some issue for CPU backend, i try to use this branch to run the cifar example and meet the following issue: ``` deepspeed cifar10_deepspeed.py...

Thanks for your quick reply. cifar deepspeed sample works with 2 proc after some little changes. ``` [1, 2000] loss: 1.692 [2022-09-08 14:25:57,361] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=0, lr=[0.001],...

pls help to reopen this PR, we will refine it based on latest transformers

> @liangan1 reopened :D Thanks. I will refresh this PR ASAP.

@gante I have rebased this PR and validated the functionality with llama model with both greedy&beam search, pls help to review. _update_ function should be a good start point to...

Thanks @gante I will refine it according to your comments.

> Thank you for working on this, it seems like it's growing in the right direction 💪 Apologies for the delayed review, I went back to reread the paper. >...

@gante Sorry for late reply due to holidays. The static cache is a good choice for greedy search which uses a large buffer to store the past key/value state add...

@gante just a soft reminder, can you help to review again.

> Hi @liangan1 👋 I'm holding the review until this PR is merged, as it might change the API of caches -- #29180 Thanks.