ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

ipex Llama.cpp server fails with Phi3 models

Open hvico opened this issue 1 year ago • 2 comments

Hi,

I've trying to serve different Phi3 models using the Llama.cpp server that is created by the init-llama-cpp ipex.

When I server with this version I have two problems:

  1. The server doesn't stop on token, it just keeps generating stuff (I need to set the -n parameter to make it stop somewhere).
  2. I also get inconsistent output, mostly gibberish.

Doing the same inference on mainstream Llama.cpp server with SYSCL backend works.

The same happens for Phi3-medium (Q6, Q4), Phi3-mini and Phi3.5 mini.

I have the latests versions of everything (driver, ipex), and I use an ARC 770 GPU.

Thanks!

hvico avatar Aug 23 '24 22:08 hvico

Hi @hvico, we are trying to reproduce your issue. Could you please show more details that displayed by ollama served side?

sgwhat avatar Aug 26 '24 01:08 sgwhat

Hi @hvico , could you please also provide us with your detail cmd so that we can try to reproduce it ?

rnwang04 avatar Aug 26 '24 02:08 rnwang04