DirtyKnightForVi

Results 24 comments of DirtyKnightForVi

> This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. This change is designed to be "opt in" initially, so the...

> > > This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. This change is designed to be "opt in" initially,...

> Does DeepSeek-VL series support input of multiple images? This doesn't seem to be stated in the paper, but `images` in the example script are `list`, which seems to be...

Simply uninstall GPTQ completely and then reinstall it to solve this problem.

> I didn't test on windows 11 but it should work if you have a GPU. Can you double check if your gptq installation is completely successful? when i trying...

> Facing the exact same issue as @DirtyKnightForVi Follow this link : [https://github.com/juncongmoo/pyllama/issues/35](url) you'd better uninstall transformers and reinstall using ' pip install git+https://github.com/mbehm/transformers '

3070的卡设计应该是支持24G的,只是推理速度会比较慢。你可以找靠谱的显卡维修佬,给你升级显存。参考B站的靓女维修佬,她把2080 11G 升级成了22G。

+1 The DeepSeek-V2-Chat is a different MOE architecture compared to DBRX, Mixtral, and GLM-4. In terms of API experience, it's on par with GPT-4

> > Hi @0sengseng0 it seems you're missing an O: `OLLAMA_NUM_PARALLEL=3` > > Will close this as it is definitely plan A :) > > According to the logs, n_ctx...

in my case, my prompt as follows: # DDL ```sql create table XXXXX (about 12000 tokens) ``` # HINT (about 500 tokens, notes, etc) if OLLAMA_NUM_PARALLEL=2,model can not get table...