hpnyaggerman
hpnyaggerman
> Can you try something like `--load-in-4bit --gpu-memory 6` to see if it works? `--auto-devices` has no effect for these 4-bit models. > > https://github.com/oobabooga/text-generation-webui/blob/026d60bd3424b5426c5ef80632aa6b71fe12d4c5/modules/models.py#L90 Experiencing the same problem as...
As it stands right now, it seems like llama.cpp (https://github.com/ggerganov/llama.cpp) is the fastest inference backend for running LLaMA models. Are there any plans for supporting calls to it for TavernAI?
### Issue: Training Process Halts with Errors **Environment Specifications:** - **Operating System:** Debian GNU/Linux 12 (bookworm) x86_64 6.1.0-9-amd64 - **Python Version:** 3.10 - GPU: Dual NVIDIA GeForce RTX 4090s -...