llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Simple webchat for server

Open tobi opened this issue 2 years ago • 14 comments

I put together a simple web-chat that demonstrates how to use the SSE(ish) streaming in the server example. I also went ahead and served it from the root url, to make the server a bit more approachable.

I tried to match the spirit of llama.cpp and used minimalistic js dependencies and went with the ozempic css style of ggml.ai.

Initially I went for no-js dependencies but gave up and used a few minimal that i'm importing from js cdns instead of adding them here. Let me know if you agree with this approach. I needed microsoft's fetch-event-source for using event-source over POST (super disappointed that browsers don't support that, actually) and preact+htm for keeping my sanity with all this state,. The upshot is that everything is in one small html file. Speaking of- there is probably a better (and less fragile) way to include the server.html in the cpp binary, but it's been 25 years since I worked with cpp tooling.

chat settings

tobi avatar Jun 26 '23 01:06 tobi

I think this is a good idea, but the html file should be in the binary. This will not work with the automatic builds because they don't include the contents of the examples directory.

slaren avatar Jun 26 '23 07:06 slaren

IMHO having the js dependencies locally would be better, so it works without an internet connection, and solves the risk of malicious js.

IgnacioFDM avatar Jun 26 '23 08:06 IgnacioFDM

I have done something like this before, I was serving HTML files in a C HTTP server. There was a CMake option to either build them in or to read them from disk. Reading from files is useful for development because you don't need to rebuild and restart the server. But building them in requires creating a small program that can hexdump the file into a C array definition. Overall pretty complex and then we have the Makefile as well...

using event-source over POST (super disappointed that browsers don't support that, actually)

Maybe we could add an endpoint with GET and query parameters?

SlyEcho avatar Jun 26 '23 09:06 SlyEcho

requires creating a small program that can hexdump the file into a C array definition.

should be pretty simple, i don't touch Makefiles directly often, how bad are custom target?

or we ship the generated.

(more reading here https://thephd.dev/finally-embed-in-c23)

Green-Sky avatar Jun 26 '23 11:06 Green-Sky

should be pretty simple, i don't touch Makefiles directly often, how bad are custom target?

I'd say Makefiles are a lot easier for this than CMake but it's just added complexity.

@tobi, How hard would it be for you to jam the HTML file contents into the .cpp file?

SlyEcho avatar Jun 26 '23 11:06 SlyEcho

after some thinking, i realized that we have pure text file(s), which means we only need to pre and post-fix the file with raw string literal markers eg:

echo "R\"htmlraw(" > html_build.cpp
cat index.html >> html_build.cpp
echo ")htmlraw\"" >> html_build.cpp

in server.cpp:

const char* html_str =
#include "html_build.cpp"
;

edit: resulting html_build.cpp:

R"htmlraw(<html></html>)htmlraw"

Green-Sky avatar Jun 26 '23 11:06 Green-Sky

Ok, gave it a go (running it) and found an issue. when ever the window looses focus (switching windows), it restarts the current outputting promt. see screencap image (i switched a couple of times back and forth)

Green-Sky avatar Jun 26 '23 12:06 Green-Sky

@tobi, How hard would it be for you to jam the HTML file contents into the .cpp file?

Simple enough. I was hoping to hear that there is some kind of #embed thing that works in all the cpp compilers that we care about. Crazy that it took till C23 to get that into the standard.

I can just include it. I can also just embed one dependency js and call it a day.

The best argument for keeping it in the html file is to allow people to hack on it easier. I think this could become a really good chatbot UX if we are welcome to contributors. It's got good bones 😄

tobi avatar Jun 26 '23 14:06 tobi

#embed is not gonna work because it's too new.

Yes, it will be harder to develop, but you can also run a simple web server like with Python while developing it.

We can improve it later.

SlyEcho avatar Jun 26 '23 14:06 SlyEcho

Check this cmake script: https://gist.github.com/sivachandran/3a0de157dccef822a230

I am also thinking if we should use the same tech to embed OpenCL kernels. The current approach which mixed kernel and normal C code will get into more maintenance headache.

howard0su avatar Jun 26 '23 14:06 howard0su

cmake script

Cool but we also have to support pure Makefile.

SlyEcho avatar Jun 26 '23 14:06 SlyEcho

i feel ignored :sweat_smile: we are not dealing with binary files here so my https://github.com/ggerganov/llama.cpp/pull/1998#issuecomment-1607300738 solution is simple 3 text file concats. pretty sure it wont get much simpler :)

Green-Sky avatar Jun 26 '23 14:06 Green-Sky

3 text file concats

Does it work in Windows?

SlyEcho avatar Jun 26 '23 14:06 SlyEcho

3 text file concats

Does it work in Windows?

if you use make on windows, you likely also have some coreutils installed (echo and cat)

cmake has built in functions for read/write/append file :)

Green-Sky avatar Jun 26 '23 14:06 Green-Sky

For me, the greatest value of this example is that it demonstrates a minimalistic way of how to implement a basic HTML/JS client that communicates with the server using just a browser without having to install node or curl. How the client is served can be solved in many different ways, depending on the needs of the specific project. I recommend to merge the example as it is and potentially add improvements from master

ggerganov avatar Jun 26 '23 18:06 ggerganov

For me, the greatest value of this example is that it demonstrates a minimalistic way of how to implement a basic HTML/JS client that communicates with the server using just a browser without having to install node or curl. How the client is served can be solved in many different ways, depending on the needs of the specific project. I recommend to merge the example as it is and potentially add improvements from master

Agree, except we should really not hard code the path to the html. we basically ship the server, and that would look funky.

@tobi would it be too much to ask to implement the html root cli parameter for the server executable? or for the fasttrack, if the hardcoded .html file could not be loaded (!file.is_open()) to fall back to the previous html string?

Green-Sky avatar Jun 26 '23 18:06 Green-Sky

sure, i'll try to do that tonight.

tobi avatar Jun 26 '23 23:06 tobi

OK so I did basically all of those things. There is now a --path param that you can point to any directory and static files will be served from this. I also added a deps.sh which just bakes the index.html and index.js into .hpp file (as per @Green-Sky's suggestion). So really, you can launch ./server from the llama.cpp folder and it will use ./examples/server/public directory, copy the ./server file to tmp and it will just use the baked ones, or use --path to work on your own UX.

The only downside is that we duplicate some files in git here, because of the baked .cpp files. But the deps are so small that it probably doesn't matter. It would be slightly cleaner to go and make deps.sh a build step in cmake and makefile, but... well... I ran out of courage.

tobi avatar Jun 27 '23 00:06 tobi

@ggerganov server is in reasonably good shape overall. Maybe time for including it in the default build?

tobi avatar Jun 27 '23 00:06 tobi

I like the current approach, with the website embedded in the binary for simplicity, but also the option to serve from a directory, to improve iteration time and to allow user customization without recompiling. It also includes the js dependencies locally.

I agree with merging this in its current state. Further improvements can be done in future PRs.

🚢

IgnacioFDM avatar Jun 27 '23 07:06 IgnacioFDM

server is in reasonably good shape overall. Maybe time for including it in the default build?

Yes, let's do that. Originally, I insisted to put it behind an option since it was bringing the boost library as a dependency, which is a very big burden. Now that the implementation is so self-contained and minimal, we should enable the build by default and maintain it long term

ggerganov avatar Jun 27 '23 08:06 ggerganov

If you pull from master, the ci issues should go away.

Green-Sky avatar Jun 27 '23 10:06 Green-Sky

Actually, could you move the generated files next to the source files into the public folder. If we get more files it will keep it neater to keep to the same directory structure.

that would make it possible to request the files at /index.html.cpp - still want that?

tobi avatar Jun 27 '23 12:06 tobi

Actually, could you move the generated files next to the source files into the public folder. If we get more files it will keep it neater to keep to the same directory structure.

that would make it possible to request the files at /index.html.cpp - still want that?

I don't think it will because the server only responds to / and /index.js?

SlyEcho avatar Jun 27 '23 13:06 SlyEcho

I don't think it will because the server only responds to / and /index.js?

it serves the entire directory now because of:

    // Set the base directory for serving static files
    svr.set_base_dir(sparams.public_path);

I think that's a good change. And if there is no index.html or index.js it will fall through to the svr.Get request and respond with the baked version. I think it's good to serve the entire sub directory because that makes it super easy for folks to make alternative UX if anyone chooses to. SillyTavern folks might want to give this a go?

tobi avatar Jun 27 '23 17:06 tobi

If you pull from master, the ci issues should go away.

*merge from master

Green-Sky avatar Jun 27 '23 19:06 Green-Sky

it serves the entire directory now because of:

   // Set the base directory for serving static files
   svr.set_base_dir(sparams.public_path);

I think that's a good change. And if there is no index.html or index.js it will fall through to the svr.Get request and respond with the baked version.

Alright, I can accept it for now.

SlyEcho avatar Jun 27 '23 19:06 SlyEcho

Well, it seems like MSVC doesn't like it. Another possibility is to turn it into an array with a Python script or maybe xxd -i

SlyEcho avatar Jun 27 '23 20:06 SlyEcho

error C2026: string too big, trailing characters truncated awwhh, msvc, why do you have to be like that. <.< edit: apparently msvc only accepts up to 16380 single-byte characters :cry:

Green-Sky avatar Jun 27 '23 21:06 Green-Sky

You know what, maybe we can solve the embedded asset problem later?

SlyEcho avatar Jun 27 '23 21:06 SlyEcho