Bug: Incorrect operation of the context shift mechanism in some models.

Open Vladonai opened this issue 1 year ago • 0 comments

What happened?

I'm working with “magnum-v2-123b-Q4_K_L”(based on Mistral Large 2 123B, I also tried “magnum-v2-123b-iQ4_K_M” and “magnum-v2-72b-iQ5_K_M”(based on Qwen2) - same problems there). I noticed that the context shift mechanism with these models is not working properly somehow. When I work with the “Lumimaid-v2(Llama3.1)-70B_Q5_K_M” model with a context size of 24k and a response window of 1k tokens, I can delete even the last few replicas without having to recalculate the entire context. I can work for hours and not recalculate the context at all. But it's different with these models. There, the context is often recalculated completely, even when only a new replica is added. It shouldn't be that way. Sometimes context switching works - I haven't caught the pattern yet. It's very inconvenient, especially compared to the “Lumimaid-v2(Llama3.1)-70B_Q5_K_M” model. I have a 4xTesla P40 and use llamacpp-server and Silly Tavern latest versions.

Links to models: Got a problem: https://huggingface.co/bartowski/magnum-v2-123b-GGUF/tree/main/magnum-v2-123b-Q4_K_L https://huggingface.co/mradermacher/magnum-v2-123b-i1-GGUF https://huggingface.co/mradermacher/magnum-v2-72b-i1-GGUF

There's no problem: https://huggingface.co/mradermacher/Lumimaid-v0.2-70B-GGUF https://huggingface.co/mradermacher/L3.1-70B-Euryale-v2.2-i1-GGUF

Name and Version

llama-server.exe --version version: 3639 (20f1789d) built with MSVC 19.29.30154.0 for x64

What operating system are you seeing the problem on?

Windows

Relevant log output

No response

Aug 29 '24 12:08 Vladonai