it’s so long a lime to wait while using a server

Open niubi-AI opened this issue 2 years ago • 3 comments

Jun 01 '23 08:06 niubi-AI

Any way to run as quickly as using llama.cpp directly? I need to save each input and response

Jun 01 '23 08:06 niubi-AI

@dansinboy are you using the default server binary that comes with llama.cpp or a binding?

Jun 01 '23 14:06 jessejohnson

@dansinboy are you using the default server binary that comes with llama.cpp or a binding?

you get the point , at first I used a binding mode with llama_cpp_python, worked badly. and then change into default server, wow, it works well right now ...

Jun 01 '23 14:06 niubi-AI

This issue was closed because it has been inactive for 14 days since being marked as stale.

Apr 10 '24 01:04 github-actions[bot]