tihanyi

Results 2 comments of tihanyi

I could also reproduce it with a server using one single slot, when the model generated a content that exceeded the context size, which may happen rarely, if no stop...

Sorry, but the patch has not resolved the issue for me. Here is a simple example how to generate: #server: ./server -m llama-2-7b.Q5_K_S.gguf --n-gpu-layers 33 --ctx-size 2048 --parallel 1 #client:...