hpnyaggerman comments

Results 14 comments of


                                            hpnyaggerman

server.py seems to ignore --gpu-memory when --load-in-4bit is specified as an option

Any plans on fixing?

Dual GPU - RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:534] Connection closed by peer

The error seems to stem from using two GPUs for training. Using just one makes the issue go away.

Suggestion: Chat UI - Add an optional “memory” section for important events.

No prototype? That seems like an awfully needed feature, with most available language models not being able to process more than 2k tokens at once

Convert h5 format to ggml

Just download the original models here: https://github.com/facebookresearch/llama/pull/73

Long-term memory summarization

Sounds like something worth investigating

Added support for 'Text generation web UI' as backend for text generation

In my experience, this PR worked a bit better than the KoboldAI API implementation on the text-generation-webgui. For example, with `--auto-devices --gpu-memory 8 --cpu-memory 45 --no-stream --extensions api` on text-generation-webgui...

Added support for 'Text generation web UI' as backend for text generation

> In my experience, this PR worked a bit better than the KoboldAI API implementation on the text-generation-webgui. For example, with `--auto-devices --gpu-memory 8 --cpu-memory 45 --no-stream --extensions api` on...

[FeatureRequest/Idea]: Ways to remember main plot-points

Related to https://github.com/TavernAI/TavernAI/issues/76 I am pretty sure

stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors fails to load

Same issue with https://huggingface.co/reeducator/vicuna-13b-free

stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors fails to load

Has anyone found a solution? What is even causing the issue?