tihanyi
Results
2
comments of
tihanyi
I could also reproduce it with a server using one single slot, when the model generated a content that exceeded the context size, which may happen rarely, if no stop...
Sorry, but the patch has not resolved the issue for me. Here is a simple example how to generate: #server: ./server -m llama-2-7b.Q5_K_S.gguf --n-gpu-layers 33 --ctx-size 2048 --parallel 1 #client:...