hpnyaggerman
hpnyaggerman
Any plans on fixing?
The error seems to stem from using two GPUs for training. Using just one makes the issue go away.
No prototype? That seems like an awfully needed feature, with most available language models not being able to process more than 2k tokens at once
Just download the original models here: https://github.com/facebookresearch/llama/pull/73
Sounds like something worth investigating
In my experience, this PR worked a bit better than the KoboldAI API implementation on the text-generation-webgui. For example, with `--auto-devices --gpu-memory 8 --cpu-memory 45 --no-stream --extensions api` on text-generation-webgui...
> In my experience, this PR worked a bit better than the KoboldAI API implementation on the text-generation-webgui. For example, with `--auto-devices --gpu-memory 8 --cpu-memory 45 --no-stream --extensions api` on...
Related to https://github.com/TavernAI/TavernAI/issues/76 I am pretty sure
Same issue with https://huggingface.co/reeducator/vicuna-13b-free
Has anyone found a solution? What is even causing the issue?