continue Not able to reach the Ollama local model hosted on another machine

Issue Category

Undocumented feature or missing documentation

Affected Documentation Page URL

No response

Issue Description

I have the following config.json:

{
  "models": [
    {
      "title": "Qwen 2.5 Coder 7b",
      "model": "qwen-2.5-coder-instruct-7b",
      "provider": "ollama",
      "apiBase": "http://192.168.120.243:9000/v1/chat/completions"
    }
  ],
  "contextProviders": [
    {
      "name": "code",
      "params": {}
    },
    {
      "name": "docs",
      "params": {}
    },
    {
      "name": "diff",
      "params": {}
    },
    {
      "name": "terminal",
      "params": {}
    },
    {
      "name": "problems",
      "params": {}
    },
    {
      "name": "folder",
      "params": {}
    },
    {
      "name": "codebase",
      "params": {}
    }
  ],
  "slashCommands": [
    {
      "name": "share",
      "description": "Export the current chat session to markdown"
    },
    {
      "name": "cmd",
      "description": "Generate a shell command"
    },
    {
      "name": "commit",
      "description": "Generate a git commit message"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen 2.5b Autocomplete Model",
    "provider": "ollama",
    "model": "qwen-2.5-coder-instruct-7b",
    "apiBase": "http://192.168.120.243:9000/v1/"
  },
  "data": [],
  "docs": [
    {
      "startUrl": "https://requests.readthedocs.io",
      "title": "requests"
    }
  ]
}

I am not being to receive a response (error 404) Here is the Ollama server logs:

time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-1f0828c4-2144-92a0-a19b-ece2a193546
a library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="2.7 GiB"
time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-a4326111-a43d-9b34-3414-701320dafc9
5 library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="9.6 GiB"
time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-5d59eeb5-c7c3-5bd7-6917-a1e7bac7a9f
e library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="10.1 GiB"
time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-1ce4c642-299f-18b4-78b1-497671a6852
b library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="9.2 GiB"
[GIN] 2025/06/09 - 15:01:40 | 404 |     808.436µs |  192.168.120.28 | POST     "/api/show"
[GIN] 2025/06/09 - 15:01:40 | 404 |     340.388µs |  192.168.120.28 | POST     "/api/show"
[GIN] 2025/06/09 - 15:01:45 | 404 |     985.237µs |  192.168.120.28 | POST     "/api/chat"
[GIN] 2025/06/09 - 15:01:59 | 404 |       8.957µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:00 | 404 |       4.077µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:05 | 404 |       6.563µs |  192.168.120.28 | POST     "/v1/api/chat"
[GIN] 2025/06/09 - 15:02:35 | 404 |      11.732µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:36 | 404 |       4.057µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:42 | 404 |       7.965µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:43 | 404 |       7.394µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:51 | 404 |       7.253µs |  192.168.120.28 | POST     "/v1/api/chat"
[GIN] 2025/06/09 - 15:03:23 | 404 |       8.096µs |  192.168.120.28 | POST     "/v1/api/chat"
[GIN] 2025/06/09 - 15:03:44 | 404 |       7.905µs |  192.168.120.28 | POST     "/v1/chat/api/show"
[GIN] 2025/06/09 - 15:03:44 | 404 |       4.258µs |  192.168.120.28 | POST     "/v1/chat/api/show"
[GIN] 2025/06/09 - 15:03:47 | 404 |       8.836µs |  192.168.120.28 | POST     "/v1/chat/api/chat"
[GIN] 2025/06/09 - 15:03:55 | 404 |        5.08µs |  192.168.120.28 | POST     "/v1/chat/completions/api/show"
[GIN] 2025/06/09 - 15:03:56 | 404 |       4.328µs |  192.168.120.28 | POST     "/v1/chat/completions/api/show"
[GIN] 2025/06/09 - 15:03:56 | 404 |       3.457µs |  192.168.120.28 | POST     "/v1/chat/completions/api/chat"
...

I tried using /v1, /v1/ and /v1/chat/completions but none of them worked. The extension is requesting /api/chat and /api/show endpoints. There are nothing in the documentation mentioning how to handle this.

Expected Content

Detail how to precisely specify the URL of a locally hosted model, with the proper endpoints. Better to have an end-to-end config.json or config.yaml example. Thanks!

Jun 09 '25 15:06 msharara1998

This only worked when using offline model with Llama cpp. The config.json file became:

{
  "models": [
    {
      "title": "Qwen 2.5 Coder 7b",
      "model": "qwen-2.5-coder-instruct-7b",
      "provider": "llama.cpp",
      "apiBase": "http://192.168.120.243:8078"
    }
  ],
  "contextProviders": [
    {
      "name": "code",
      "params": {}
    },
    {
      "name": "docs",
      "params": {}
    },
    {
      "name": "diff",
      "params": {}
    },
    {
      "name": "terminal",
      "params": {}
    },
    {
      "name": "problems",
      "params": {}
    },
    {
      "name": "folder",
      "params": {}
    },
    {
      "name": "codebase",
      "params": {}
    }
  ],
  "slashCommands": [
    {
      "name": "share",
      "description": "Export the current chat session to markdown"
    },
    {
      "name": "cmd",
      "description": "Generate a shell command"
    },
    {
      "name": "commit",
      "description": "Generate a git commit message"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen 2.5b Autocomplete Model",
    "provider": "llama.cpp",
    "model": "qwen-2.5-coder-instruct-7b",
    "apiBase": "http://192.168.120.243:8078"
  },
  "data": [],
  "docs": [
    {
      "startUrl": "https://requests.readthedocs.io",
      "title": "requests"
    }
  ]
}

Weird enough, when the provider is ollama, and the apiBase points to a running ollama model and its correct port, the extension tries to request /api/generate and /api/show, which does not appear to be compatible with ollama (404 not found). Now, I think this is a bug with the extension when the provider is set to ollama.

Jun 09 '25 17:06 msharara1998

/api/generate is documented here for Ollama: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion

but you're right that llama.cpp wouldn't support that. I think it probably just comes down to the apiBase. In the original config you shared, you should strip the /v1/... and make it just the host and port

Jun 09 '25 23:06 sestinj

I tried it with and without /v1, same result. No, the issue is not in llama.cpp, llama.cpp provider worked, but ollama didn't work. I have ollama installed on a server, and when pointing to the ollama URL on that server it doesn't work.

Jun 11 '25 04:06 msharara1998

is there an update on this or a work around that someone's figured out?

Jun 13 '25 16:06 11

Hi @msharara1998 ,

Could you confirm that apiBase is set to http://192.168.120.243:9000 in both models and tabAutocompleteModel? Also, note that the correct model name string in Ollama appears to be qwen2.5-coder:7b-instruct, not qwen-2.5-coder-instruct-7b. If you still encounter issues, it would be appreciated when you can share the logs from your IDE's Developer Tools/ Logs.

Jun 16 '25 18:06 Phan-Tran-Khanh

Repro is very easy, just setup ollama on a 2nd machine, then set that machine as the apiBase and it will not work. The log is showing it's getting a 404, the reason is because it's tacking on /api/ to the URL which isn't valid.

I've written a Go program to forawrd all network traffic on localhost:11434 to the remote machine to pretend that there is ollama running locally, that works fine.

Jun 27 '25 15:06 BrentTripwire

This issue hasn't been updated in 90 days and will be closed after an additional 10 days without activity. If it's still important, please leave a comment and share any new information that would help us address the issue.

Sep 26 '25 02:09 github-actions[bot]

This issue was closed because it wasn't updated for 10 days after being marked stale. If it's still important, please reopen + comment and we'll gladly take another look!

Oct 06 '25 02:10 github-actions[bot]