Ye Yuan
Ye Yuan
I updated my `datasets` package from 2.12.0 to 2.16.0 and the issue disappeared. Perhaps this should be added to the dependencies. Thanks anyway.
> Hi, you can add `repeat_kv` from `inf_llm/attention/utils.py` before the qk computation. I've found it. Thanks a lot!
I have the same question. The paper didn't describe what is NVILA-Lite model. I'm also wondering what's the difference among all different types of models.
Thanks for the explanation. Looking forward to the new version of your paper!
Hi, I got the same error. Did you fix it?
> > Hi, I got the same error. Did you fix it? > > Yes, the API seemed changed, but it still didn't work well for me. I've changed the...
I've encountered the same problem on 2*L20 and Qwen-2.5-32B-Instruct, exactly same with this post. The refered similar issue is about 4-bit GPTQ, but I used BF16 and got the same...
> > I've encountered the same problem on 2*L20 and Qwen-2.5-32B-Instruct, exactly same with this post. The refered similar issue is about 4-bit GPTQ, but I used BF16 and got...
any updates on this?
> > any updates on this? > > Thank you reply > > Already solve it, I was just only modify docker config. > > ``` > services: > chatgpt-next-web:...