opencode When use gpt-oss-120B by vLLM locally, opencode doesn't call the tools

Description

When I try use local model (such as gpt-oss-120b and qwen3-32b), there is only content of thinking in response, no tools calling(even if model thinks it should call tools from thinking content) and no other response. But sometimes it be normally(very rare). This is very strange.

Plugins

No response

OpenCode version

1.1.4

Steps to reproduce

opencode.json:

vllm serve:

Screenshot and/or share link

No response

Operating System

No response

Terminal

No response

Jan 07 '26 09:01 inv1s10n

This issue might be a duplicate of existing issues. Please check:

#7083: Using local Ollama models doesnt return any results - same symptoms with local models returning only JSON tool calls instead of actual responses
#6649: Weird error with toolcalling on local model - similar tool calling issues with local models via Llama.cpp
#5694: Local Ollama models are not agentic - local models not able to make tool calls properly
#6223: Model not returning the answers - Qwen2.5-coder with Ollama not responding, only showing JSON tool calls
#4428: Why is opencode not working with local llms via Ollama - widespread issue with local LLMs and tool calling
#4255: OpenCode v1.0.25 Hangs Indefinitely with LM Studio + Qwen Models - detailed analysis of empty tool_calls array handling issue with local models
#234: Tool Calling Issues with Open Source Models - discusses case sensitivity and tool calling inconsistencies with open source models

All of these report similar symptoms: local models (gpt-oss, qwen variants) via vLLM/Ollama either failing to call tools, only returning thinking content, or hanging indefinitely.

Feel free to ignore if none of these address your specific case.

Jan 07 '26 09:01 github-actions[bot]

I found there is no introduction about vLLM in official docs. Whether vLLM is not supported well ?

Jan 07 '26 10:01 inv1s10n

I have running similar issue, as described here: https://github.com/anomalyco/opencode/issues/7083 and it seems that locall llms from ollama are not capable of code assistant tasks in Opencode.

What hardware are you using, btw ?

Jan 07 '26 15:01 padsbanger

a A100, linux. I'm not sure whether this is a problem with the model or the startup parameters. Or it need other plugins ?

Jan 08 '26 01:01 inv1s10n

I am using 'GPT-OSS' on Ollama on local machine. I have ollama setup on Ubuntu 22 LTS with a Nvidia 3090. I have it already setup with 'OpenWebUI'. I started using 'opencode' yesterday. And 'opencode' is realy a big step forward in AI in my perspective. I was hooked with the agentic work that Network Chuck showed on his channel: https://www.youtube.com/watch?v=MsQACpcuTkU It is preetty fantastic if this could work. I was realy fascinated, when i have seen on the above video, how better an experience can be achieved. Especialy if i can put agentic work to work on my '.md' files that i have saved on 'NextCloud'.

And i would realy like if this would also work with local ollama models like gpt-oss. ollama shows me that tools are also supported with 'Mistral'. Which i also could not get to work.

Are there any more detailed tutorials how to get this to work? Is this even an 'opencode' issue or is this a model issue? I would be very happy if someone can point me in a direction how i can get this to work. A lot of hat-downs to the creators of 'opencode'. this is realy fantastic.

Jan 08 '26 19:01 nilomind-code

if u are using vllm i thought there was a long open issue about tool calling not working through vllm for like months w/ gpt oss

Jan 08 '26 23:01 rekram1-node

This is a bad news. Is there a link to this issue? And is there a plan to address this issue?

Jan 09 '26 01:01 inv1s10n

Had the same issue. Was a problem of the context window. Ollama uses a default context window of 2048, which isn't enough to put all tool descriptions and examples schemas in. Create a Modelfile for gpt-oss:120b like this:

FROM gpt-oss:120b

PARAMETER num_ctx 16384
PARAMETER temperature 1

then run:

ollama create gpt-oss:120b-16384 -f Modelfile

This should ensure that the model has sufficient context to consider all tool definitions. If this doesn't work, increase to 32k

Jan 09 '26 01:01 famitzsy8

Thank you, but I am not using ollama, and when I started the model with vLLM, I set the max-model-len to 40960

Jan 09 '26 01:01 inv1s10n

Hi, @rekram1-node. Can using sglang solve this problem？

Jan 09 '26 03:01 Charmnut

I am also having this issue but have 128k context on a DGX spark,

my vLLM

services:
  vllm:
    image: nvcr.io/nvidia/vllm:25.11-py3
    restart: always
    container_name: vllm-gpt-oss-120b
    ports:
      - "8000:8000"
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
    environment:
      VLLM_USE_NATIVE_FP4: "1"
      VLLM_FP4_IMPL: "nvfp4"
    ipc: host
    ulimits:
      memlock: -1
      stack: 67108864
    volumes:
      - ./model-cache:/root/.cache/huggingface
      - ./vllm-cache:/root/.cache/vllm
    command: >
      vllm serve openai/gpt-oss-120b
        --host 0.0.0.0
        --port 8000
        --quantization mxfp4
        --trust-remote-code
        --gpu-memory-utilization 0.90
        --max-model-len 131072
        --enable-prefix-caching
        --enable-auto-tool-choice
        --tool-call-parser openai

Jan 11 '26 21:01 TrashcanOfLove

I tried the following methods, and they seem to be effective.

Jan 12 '26 08:01 inv1s10n

@inv1s10n is it working well? What vllm version are u using??

Jan 12 '26 13:01 logan-jun

I recently have no issues with its use. My vllm version is 0.14.0rc1.

Jan 13 '26 06:01 inv1s10n

Had the same issue. Was a problem of the context window. Ollama uses a default context window of 2048, which isn't enough to put all tool descriptions and examples schemas in. Create a Modelfile for gpt-oss:120b like this:

FROM gpt-oss:120b

PARAMETER num_ctx 16384 PARAMETER temperature 1

then run:

ollama create gpt-oss:120b-16384 -f Modelfile

This should ensure that the model has sufficient context to consider all tool definitions. If this doesn't work, increase to 32k

Thank you!!!!!!! This worked for me (I am using ollama)

Jan 15 '26 17:01 marc2332

I think Ideal solution here is doing a pass on ollama so that we automatically send the num_ctx param so u dont ahve to, I need to look into it, but last time you werent able to do that via openai compat endpoints... If this has changed it should be an easy fix

Jan 15 '26 17:01 rekram1-node