Using local Ollama models doesnt return any results.
Question
Hello, I am running Opencode using local llms from ollama:
- qwen2.5-coder:7b
- qwen2.5-coder:32b
- qwen2.5-coder:32b
- codellama:34b
os: Omarchy gpu: rtx 4070ti
When I run them on my projects (generic React crud apps) using /init command I get following responses:
{"name": "todoread", "arguments": {}}
or
{"name": "read", "arguments": {"path": "/home/Projects/react-app/AGENTS.md"}}
Here is my opencode.json file:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"name": "Ollama (Local)",
"options": {
"baseURL": "http://127.0.0.1:11434/v1",
"apiKey": "ollama" // Optional dummy key; some setups need it
},
"models": {
"qwen2.5-coder:7b": {
"name": "Qwen2.5-Coder 7B"
},
"qwen2.5-coder:32b": {
"name": "Qwen2.5-Coder 32B"
},
"deepseek-coder-v2": {
"name": "DeepSeek-Coder-V2 16B"
},
"qwen3-coder": {
"name": "Qwen3-Coder 30B"
},
"glm4": {
"name": "GLM-4"
},
"codellama:34b": {
"name": "CodeLlama 34B"
},
"codestral": {
"name": "Codestral"
},
"gpt-oss": {
"name": "GPT-OSS"
}
}
}
},
"model": "ollama/qwen2.5-coder:32b" // Default: provider_id/model_id format
}
When I switch to cloud providers such as Copilot of Grok Code Fast, I get desired results (agents.md files). Any suggetions what I am doing wrong ?
This issue might be a duplicate of existing issues. Please check:
- #5694: Local Ollama models are not agentic
- #4488: Ollama integration: API returns HTTP 200 but responses not displayed
- #5187: Ollama: User message content arrives as empty array - model cannot see user input
- #3265: ollama integration no results returned
- #5210: Custom OpenAI-compatible provider returns no text content
- #7030: Ollama (qwen2.5-coder): tool calls (edit/write) show as executed but no files are created/modified
Feel free to ignore if none of these address your specific case.
if you're able to use those models, you are probably able to use qwen3-coder:30b-a3b, which is a much better model for doing tool calls. i've just tried qwen3-coder:30b-a3b on my setup and the tool calls do work, at least for the quick test i did. the biggest issue i've seen with ollama models is they mostly are horrible at tool calling, nothing specific to ollama though, they are horrible at tool calling no matter what they're running on. you might also look into mistral models or most models much newer than qwen2.5, which are specifically engineered for more reliable tool calling.
Plus one!!!!
I also have a similar issue trying nemotron-3-nano:30b from ollama. If I start opencode leave it in build mode on the splash prompt and ask a question, it works along with subsequent prompts, tool use fails trying to get it to ping 8.8.8.8 three times, it used the wrong tools and lacked definition to use bash when explicitly asked to. if the mode is switched to plan or even back to build, it just replies to the system prompt.
Ok seems this is all related to the context window, see https://opencode.ai/docs/providers/#ollama and https://blog.driftingruby.com/ollama-context-window/ A simple greeting uses 10k context and the ollama default is 8k. @padsbanger try this ctx window increase.
Plus one!!!!
I also have a similar issue trying nemotron-3-nano:30b from ollama. If I start opencode leave it in build mode on the splash prompt and ask a question, it works along with subsequent prompts, tool use fails trying to get it to ping 8.8.8.8 three times, it used the wrong tools and lacked definition to use bash when explicitly asked to. if the mode is switched to plan or even back to build, it just replies to the system prompt.
Ok seems this is all related to the context window, see https://opencode.ai/docs/providers/#ollama and https://blog.driftingruby.com/ollama-context-window/ A simple greeting uses 10k context and the ollama default is 8k. @padsbanger try this ctx window increase.
Thank you, increasing context windows did something (I increased to 32k). However I have run into another issue. When I run /init on repo in Build mode, it outputs makrdown file, however it is unable to create this file.
I get this response from devstral:
I'm sorry for any confusion, but I don't have the capability to directly create files or use tools in the way you're asking. However, I can certainly help guide you on how to create a agents.md file yourself! If you provide me with the content you'd like to include in the file, I can help you format it properly for Markdown.
Here's an example of what the file might look like if you want to document different types of agents:
It is weird, because devstral supports tools and file access.
i notice you don't have the tool use param in your opencode config. you should probably put that in. here's what I have for my ollama server
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://hawk.swift.local:11434/v1",
"includeUsage": true,
"timeout": 1000000
},
"models": {
"qwen3-coder:30b": {
"name": "qwen3-coder:30b",
"tool_call": true,
"reasoning": true,
"cost": {
"input": 0.10,
"output": 1.20
}
},
"gpt-oss:120b": {
"name": "gpt-oss:120b",
"tool_call": true,
"reasoning": true,
"cost": {
"input": 0.10,
"output": 1.20
}
},
"ministral-3:14b": {
"name": "ministral-3:14b",
"tool_call": true,
"reasoning": true,
"cost": {
"input": 0.10,
"output": 1.20
}
},
"deepseek-r1:32b": {
"name": "deepseek-r1:32b",
"tool_call": true,
"reasoning": true,
"cost": {
"input": 0.10,
"output": 1.20
}
}
@aaronnewsome, adding tool_call still does not help
I think I have tested most popular models, that can run on my graphics card (rtx 4070ti, 12gb vram). Tried different context windows, different flags, like "tool_call", "reasoning" etc. All of the time, LLMs just keep hallucinating, not generating any results (I have tried /init as a benchmark). Are there any ollama models that can actually run locally and still be useful in Opencode with programming tasks ? When I run these models in ollama cli they work fine.
i too have had REALLY bad luck with small models and opencode. I've found qwen3-coder:30b to be one of the better ones, gpt-oss:120b is supported well in Ollama and runs decently, but really any model I would run on Ollama has a ton of failed tool calls over time, which really slows down your workflow. My best, most stable local models for coding specifically have been, glm-4.5, glm-4.7, minimax m2 and minimax m2.1. i've also had pretty good luck with qwen3-coder:480b. if you must use ollama rather than llama.cpp, i'd suggest at least using the Q8 quant of your preferred model, those seem a bit smarter than the Q4 ollama quants.
qwen3-coder:480b fails to run my on PC, i have only 32gb ram :/
For those who are arriving here from search engines, this appears to be a major ongoing difficulty with OpenCode for on premise usage. I did a substantial review across the web of this issue several days ago, and found very many different branches of user problems, though all reduce to the notion that OpenCode does not have a local open source LLM solution. As summarized above, most users discuss the Ollama context length limitations (which has its own foibles, and does not solve all problems) and others discuss the use of the Qwen30b model as the only one which works. Generally, the discussion tends to circle around a drain right at the point someone says "the way Ollama uses tools is [negative word]". I cannot speak to knowing how the cool calling mechanisms are implemented (other than enforcing JSON responses, and some light parsing of non-JSON string responses to coerce them into JSON), and I have also seen problems with PydanticAI being too opaque to see in what ways they are editing their prompts and response closed-loops for a basis of comparison (structured data, instead of tool calls, specifically). This all puzzles me, as tool calling is more or less across the board the same action requirements and these packages are open source. It seems unclear to me why Ollama would implement its own tool coercion logic, for instance, or why six months later the tools would still not work as well. This is an issue different than the broad LLMs capacity to return correct inputs, but these models, especially up near 32GB ought to be way, way more than sufficient, especially if the specific tool requirements available are part of the input context. What all this information in this paragraph is trying to communicate is a single-source of this industry-wide open source problem, and one that is major in its implications. And to follow-on the conversation to help as I may be able to. For all intents and purposes I have a similar setup as the ticket described above and the same problem to solve.
Thanks.
Give the qwen3-coder:30b a try, at least see if the Q8_K_XL quant is more reliable at tool calling compared to what's in the ollama library. try this one:
https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF
on the page, just click Use This Model button, then click Ollama which will give you a command line for ollama run or pull.
plus, Unsloth always does a great job of doing chat template fixes to make tool calls more reliable, a MUST with opencode.
I think a more thorough pulling back the veils on the various options and problems, to reach congruity, is necessary.
After all, once a dozen different tribes of people face a problem as large as "open source LLMs with open source OpenCode" marked as unusable, we have an industry mistake larger than one GitHub ticket and one recommendation of a model.
And I am not an expert in recent model performance problems with tools, but I am nearly absolutely certain this is not a model size problem. Providing function schema in context and coercion of outputs (similar to PydanticAI), with a few closed-loop retries, ought to be far more than enough with edge-device sized models. Here is a quick example with a relatively small model.
This demonstration shows to me that small models are more than sufficient for low quality effective tool use. What we seem to have is a configuration and obfuscation problem. This is the reason for my posting to the ticket above.
I'm not sure I agree with any of this. Ollama GGUF models use the built-in chat template. Unless you create the GGUF yourself, it's using the chat template inside the GGUF. As I've learned, chat templates really matter when it comes to tool calling. Especially with the wildly different tool calling implementations across the models. Some use xml, some use json, some do weird things with how they do thinking between tool calls (no additional USER input), and newer models are doing even more weird things with interleaved thinking and probably a bunch of stuff I've never even heard of. Tool calling is messy and super-inconsistent. From my personal experience, older, smaller "tool calling capable" models are just not as reliable as newer, larger models - which have been better for me with consistent tool calling.
i do agree with one point though, tool calling should be more consistent and reliable for ANY tool calling capable model. unfortunately, I don't think that's a problem that can be solved by opencode OR ollama.
every model maker makes a decision to follow an established "standard" for tool calling, chat templates, etc or just invent new ones.
@davidbernat Hi, I encountered the same issue when deploying Qwen2.5-Coder with Ollama. The model can run normally and generate outputs, but actual file operations never happen. I’ve already adjusted the context length and enabled tool calling.
What I’m wondering is: Is this problem caused by OpenCode, by Ollama, or by the combination of OpenCode + Ollama?
And if I switch to OpenHands + Ollama, will I run into the same issue?
It Ollama consistence, based on what was posted by @davidbernat (If I understand correctly).
I agree with your statements @aaronnewsome and to some extend the trouble is that different networks expect different input and output string formats (XML or JSON, for instance). But this is not a problem to Ollama, tools with sufficient logic under the hood like Pydantic and PydanticAI exist, there is probably no larger use case in AI today (huge industry motivation), and the various requirements of each network should be well-documented (since they were data munged at training procedure). For the second part of this statement, absolutely no alternative to Ollama seems to exist in the open source community, which is verging on unwise if these problems with Ollama have persisted for more than a year into the emergence of open coding and tools. (The few non-Ollama solutions I see provided as an alternative all suspiciously accumulate around the same for-profit company, which makes me not trust this situational blocker as genuine even more.) I do not know enough about the internal workings of this tool problem (at the cross section of OpenCode, PydanticAI, and Ollama) right now to comment with authority any further, hence this call to action as a central meeting place for discussion on the OpenCode board (who should handle this, and its education, as the utility is so core to their value proposition). I am very excited about what their awesome team can do!
There are three clear components to the problem, and three clear sources of first-pass learning:
- PydanticAI is a structured output munger which modifies prompts, intermediate chat cycles, and outputs to get a clean JSON
- Ollama is a system-wide interface between networks' inputs and outputs, which should be enabled to make zero modification
- OpenCode does additional transformations of prompt engineering and caching local context for its repeating thinking pattern
Concrete information is always welcomed. (cc: @YupuWang2001 @padsbanger)
See https://github.com/anomalyco/opencode/issues/5694#issuecomment-3667094022
@minger0 Great. That comment is insightful; and people of this ticket should know about that ongoing discussion. In short, as it suggests: I will check out ollama v2, and see whether this resolves the tool calling problems of ollama v1.
still, however: this all is far too opaque for users. There is no reason for tool calling success to happen layers deep. We should know what changes can be made in their update, why those were made, and how those change procedure. It is not clear to me that, if tool calling is this big a problem, we should all understand how to tinker in the guts of those three packages described above. For instance, why does PydanticAI not simply tell users (or me, a scientist), or log to console when requested, exactly how PydanticAI munges prompts and exactly what raw results the LLM returns? Even the requirement that PydanticAI users use cloud-based LogFire is highly suspect in conjunction with its other problems.
I was struggling with my own Ollama deployment on Modal for a while (turns out my SSE proxy was only emitting \n instead of \n\n between events, breaking the parser).
Now both the ollama Qwen3 30B and the Huggingface A3B version are working well. Llama 3.2 3B does not work well as it struggles with tool calls.
Here's my working OpenCode config:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"modal-ollama": {
"npm": "@ai-sdk/openai-compatible",
"name": "Modal Ollama",
"options": {
"baseURL": "https://<my_remote_ollama_endpoint>/v1",
"num_ctx": "65536"
},
"models": {
"llama3.2:3b": {
"name": "Llama 3.2 3B",
"limit": {
"context": 128000,
"output": 64000
}
},
"qwen3-coder:30b": {
"name": "Qwen 3 Coder 30B",
"tool_call": true,
"reasoning": true,
"limit": {
"context": 256000,
"output": 64000
}
},
"hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q8_K_XL": {
"name": "Qwen 3 Coder 30B A3B Q8 HF",
"tool_call": true,
"reasoning": true,
"limit": {
"context": 256000,
"output": 64000
},
}
}
}