gemma.cpp icon indicating copy to clipboard operation
gemma.cpp copied to clipboard

[Feature request] Add simple HTTP API server like in llama.cpp with api like OpenAI

Open pythops opened this issue 2 years ago • 6 comments

For more infos here https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md

pythops avatar Feb 21 '24 13:02 pythops

Great suggestion, if there's others who interested please +emoji above and we'll prioritize this :)

austinvhuang avatar Feb 21 '24 15:02 austinvhuang

Just for the update: llama.cpp added support for gemma models https://github.com/ggerganov/llama.cpp/pull/5631

pythops avatar Feb 21 '24 19:02 pythops

Just for the update: llama.cpp added support for gemma models

https://github.com/ggerganov/llama.cpp/pull/5631

Also with 💎Gemma in 🦙Llama.CPP you get CUDA, Neon and AMD GPUs support! And - in theory - running into the browser if you can compile to WASM.

loretoparisi avatar Feb 21 '24 23:02 loretoparisi

adding a api like support would be great these models can be used on cpu for smaller tasks. +1 for this.

omkar806 avatar Apr 19 '24 04:04 omkar806

I have a question: why using http but not websocket?

As I known, the answer token is generated one word by one word. And, seems, http has no function to do multi-responses for one call. Which means , http need to gather the whole answer before trans it back.

zeerd avatar Apr 24 '24 05:04 zeerd

I have a question: why using http but not websocket?

As I known, the answer token is generated one word by one word. And, seems, http has no function to do multi-responses for one call. Which means , http need to gather the whole answer before trans it back.

WebSocket is more suitable for instant messenger style UI but may not be ideal for other UI types. And I think it is better to integrate gemma.cpp as a module into the web backend framework than to implement the HTTP/WebSocket API directly.

Here is my WebSocket online demo solution, and you can try it here or via this Kaggle notebook. In this solution gemma.cpp is a module of OpenResty which makes it easy to implement WebSocket or HTTP API.

ufownl avatar Apr 26 '24 09:04 ufownl