it’s so long a lime to wait while using a server
Any way to run as quickly as using llama.cpp directly? I need to save each input and response
@dansinboy are you using the default server binary that comes with llama.cpp or a binding?
@dansinboy are you using the default server binary that comes with llama.cpp or a binding?
you get the point , at first I used a binding mode with llama_cpp_python, worked badly. and then change into default server, wow, it works well right now ...
This issue was closed because it has been inactive for 14 days since being marked as stale.