ipex Llama.cpp server fails with Phi3 models
Hi,
I've trying to serve different Phi3 models using the Llama.cpp server that is created by the init-llama-cpp ipex.
When I server with this version I have two problems:
- The server doesn't stop on
token, it just keeps generating stuff (I need to set the -n parameter to make it stop somewhere). - I also get inconsistent output, mostly gibberish.
Doing the same inference on mainstream Llama.cpp server with SYSCL backend works.
The same happens for Phi3-medium (Q6, Q4), Phi3-mini and Phi3.5 mini.
I have the latests versions of everything (driver, ipex), and I use an ARC 770 GPU.
Thanks!
Hi @hvico, we are trying to reproduce your issue. Could you please show more details that displayed by ollama served side?
Hi @hvico , could you please also provide us with your detail cmd so that we can try to reproduce it ?