unexpected error format in response (status code: 500) with gpt-oss:20b
Hello there!
Not quite sure if it's relevant to https://github.com/ollama/ollama/issues/11704 or is it a separate thing.
I have updated to the latest version of ollama (which is 0.11.4 as of now). I'm using an official ollama python library (0.5.3) and still constantly getting 500 errors with gpt-oss:20b
unexpected error format in response (status code: 500)
Traceback (most recent call last):
File "/home/user/test_agent/benchmark.py", line 479, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/test_agent/agent.py", line 581, in llm_process_question
response = OllamaClient.chat(model=model_name, messages=messages)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/miniconda3/envs/test_ollama/lib/python3.12/site-packages/ollama/_client.py", line 342, in chat
return self._request(
^^^^^^^^^^^^^^
File "/home/user/miniconda3/envs/test_ollama/lib/python3.12/site-packages/ollama/_client.py", line 180, in _request
return cls(**self._request_raw(*args, **kwargs).json())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/miniconda3/envs/test_ollama/lib/python3.12/site-packages/ollama/_client.py", line 124, in _request_raw
raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: unexpected error format in response (status code: 500)
I'm not using any tools - just plain chat calls like res = OllamaClient.chat(model=model_name, messages=messages) and some custom scaffolding around it. I even tested it with overriding the default three-pages-long TEMPLATE with my much more simpler version of TEMPLATE, inherited from simpler models like qwen2.5. With that template gpt-oss:20b acts much more stable, but still prone to 500 time to time.
Is it ollama or ollama python client here who tries (and fails) to parse the output from a model?
Hard to tell what's going wrong from just this - are you able to run the model fine on the CLI? ollama run gpt-oss:20b I wouldn't recommend playing with the template too much - it's VERY different from other models and has a special parser for it.
Sure, I get it. This is why I mentioned that I have tried it with different template only as a matter of test. Both gpt-oss:20b models with original template and my simplified template eventually raise 500 time to time.
Sure, I get it. This is why I mentioned that I have tried it with different template only as a matter of test. Both gpt-oss:20b models with original template and my simplified template eventually raise 500 time to time.
Including on the CLI? Could you send the server logs if so?
Sure, guys. Thanks a lot. Give me some time and I'll try to make a small reproducable example to share together with server logs. Also will try to reprodude the same in pure ollama run cli.
I was somewhat able to reproduce it in clean ollama run cli chat. The model response was cut in the middle of thinking and >>> ollama prompt appeared welcoming me to enter new message to chat. Although no 500 were seen and nothing interesting in the log, but not sure if that ever happens to ollama run and what kind of message I should be looking for, when I was getting 500 via python ollama library.
I'll do few more experiments to make sure I can reproduce this with high success rate and share it all to you, if all that is relevant.
if you can make that happen again, running with OLLAMA_DEBUG=2 set should make the server logs give more info about what's going on (it'll include both the raw text coming from the model and some of the parsing decisions being made)
Sorry guys, the workload at my current project doesn't give me a moment even to gasp an air - I've not vanished, I'm just waiting for a calm hour or two to get back to this ticket
not a problem! come back when you're ready and good luck on your current project :)