Results 6 comments of Present

It appears we need to either wait for gander to be updated, or make another fork to fix the PendingIntent issue for API31 and above.

> > > @alifgiant Adding chips using `setText` works great. However, if there are lots of chips, only one lines worth of chips is shown once the view is loaded....

Without the `--jinja` flag it seems to work. Request ```bash curl --location 'http://localhost:1234/v1/chat/completions' \ --header 'Content-Type: application/json' \ --header 'Cookie: frontend_lang=en_US' \ --data '{ "model": "hermes-3-llama-3.1-8b", "messages": [ { "role":...

@danbev I tried building llama.cpp on your branch locally and tested it, but it seems now neither the tools or response format is ignored by the model if we use...

@danbev I see the PR has been merged to master and I just tested the latest build (https://github.com/ggml-org/llama.cpp/releases/tag/b4739). It looks good so far, let me close this issue. `llama-server -m...

@danbev I'm not reopening this issue since this works now, but just to note that I think the `response_format` that can be used isn't exactly matching with OpenAI and llama.cpp...