continue icon indicating copy to clipboard operation
continue copied to clipboard

litellm(OpenAI) Provider Not Working for Tab Autocomplete in Continue.dev

Open mlibre opened this issue 10 months ago • 4 comments

Before submitting your bug report

  • [x] I believe this is a bug. I'll try to join the [Continue Discord](https: //discord.gg/NWtdYexhMs) for questions
  • [x] I'm not able to find an [open issue](https: //github.com/continuedev/continue/issues?q=is%3Aopen+is%3Aissue) that reports the same bug
  • [ ] I've seen the [troubleshooting guide](https: //docs.continue.dev/troubleshooting) on the Continue Docs

Relevant environment info

- OS: linux/ubuntu
- Continue version: 10.3
- IDE version: 1.97.2
- Model:
- config: "models": [
  {
    "model": "test-model",
    "title": "word",
    "provider": "openai",
    "apiBase": "http://192.168.18.10:4000/",
    "apiKey": ""
  }
],
"tabAutocompleteModel": {
  "model": "test-model-2",
  "title": "word",
  "provider": "openai",
  "apiBase": "http://192.168.18.10:4000/",
  "apiKey": ""
},
  
  OR link to assistant in Continue hub:

Description

Description

When configuring Continue.dev to use ollama as the provider for tab autocomplete, everything works correctly. However, when switching to openai (litellm), tab autocomplete does not function as expected.

Expected Behavior

Tab autocomplete should work regardless of whether the provider is ollama or openai.

Actual Behavior

  • Working Configuration (Ollama)
  "tabAutocompleteModel": {
  "model": "test-model",
  "title": "word",
  "provider": "ollama",
  "apiBase": "http://192.168.28.110:11434"
}

This configuration works without any issues.

  • Non-Working Configuration (OpenAI)
  "tabAutocompleteModel": {
  "model": "test-model-2",
  "title": "word",
  "provider": "openai",
  "apiBase": "http://192.168.18.10:4000/",
  "apiKey": ""
}

When switching to openai, tab autocomplete stops working.

Additional Information

  • Ollama Logs (Working Case)
    • The model loads and processes requests correctly.
msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-24b532e5276503b147d0eea0e47cb1d2bcce7c9034edd657b624261862ca54a1
msg="generate request" images=0 prompt="<|fim_prefix|>// Path: Untitled.txt\n//   \"tabAutocompleteModel\": {\n//     \"model\": \"test-model\",\n//     \"title\": \"word\",\n//     \"provider\": \"ollama\",\n//     \"apiBase\": \"http://192.168.28.110:11434\"\n//   }\n// Path: config.json\n// {\n//   \"models\": [\n//     {\n//       \"model\": \"test-model\",\n//       \"title\": \"word\",\n//       \"provider\": \"openai\",\n//       \"apiBase\": \"http://192.168.18.10:4000/\",\n//       \"apiKey\": \"\"\n//     }\n//   ],\n//   \"tabAutocompleteModel\": {\n//     \"model\": \"test-model\",\n//     \"title\": \"word\",\n//     \"provider\": \"ollama\",\n//     \"apiBase\": \"http://192.168.28.110:11434\"\n//   },\n//   \"embeddingsProvider\": {\n//     \"provider\": \"openai\",\n//     \"model\": \"snowflake-arctic-embed2\",\n//     \"apiBase\": \"http://192.168.18.10:4000\"\n//   },\n//   \"contextProviders\": [\n//     {\n//       \"name\": \"code\",\n//       \"params\": {}\n//     },\n//     {\n//       \"name\": \"docs\",\n//       \"params\": {}\n//     },\n// Path: config.json\n// {\n//   \"models\": [\n//     {\n//       \"model\": \"test-model\",\n//       \"title\": \"word\",\n//       \"provider\": \"openai\",\n//       \"apiBase\": \"http://192.168.18.10:4000/\",\n//       \"apiKey\": \"\"\n//     }\n//   ],\n//   \"tabAutocompleteModel\": {\n//     \"model\": \"test-model\",\n//     \"title\": \"word\",\n//     \"provider\": \"ollama\",\n//     \"apiBase\": \"http://192.168.28.110:11434\"\n//   },\n//   \"embeddingsProvider\": {\n//     \"provider\": \"openai\",\n//     \"model\": \"snowflake-arctic-embed2\",\n//     \"apiBase\": \"http://192.168.18.10:4000\"\n//   },\n//   \"contextProviders\": [\n//     {\n//       \"name\": \"code\",\n//       \"params\": {}\n//     },\n//     {\n//       \"name\": \"docs\",\n//       \"params\": {}\n// Path: config.json\n// {\n//   \"models\": [\n//     {\n//       \"model\": \"test-model\",\n//       \"title\": \"word\",\n//       \"provider\": \"openai\",\n//       \"apiBase\": \"http://192.168.18.10:4000/\",\n//       \"apiKey\": \"\"\n//     }\n//   ],\n//   \"tabAutocompleteModel\": {\n//     \"model\": \"test-model\",\n//     \"title\": \"word\",\n//     \"provider\": \"ollama\",\n//     \"apiBase\": \"http://192.168.28.110:11434\"\n//   },\n//   \"embeddingsProvider\": {\n//     \"provider\": \"openai\",\n//     \"model\": \"snowflake-arctic-embed2\",\n//     \"apiBase\": \"http://192.168.18.10:4000\"\n//   },\n//   \"contextProviders\": [\n//     {\n//       \"name\": \"code\",\n//       \"params\": {}\n//     },\n//     {\n//       \"name\": \"docs\",\n//       \"params\": {}\n// Path: test/main.js\n// \treturn a + b;\n// }\n// test/main.js\n/*\nThis is the main function\n*/\nvoid async function main() {\n\tlet a = 10;\n\tlet b = 20;\n\ta += sum(a, b); // Use the exported sum function\n\tconsole.log('Result:', a);\n}()\n\n\n// write the sum function here\n\nfunction sum(a,b)\n{\n<|fim_suffix|>\n}<|fim_middle|>"
msg="loading cache slot" id=0 cache=948 prompt=945 used=941 remaining=4
Mar 08 16: 12: 54 nlp-infraai-47-dev.word3.psg.network ollama[
  344943
]: [GIN
] 2025/03/08 - 16: 12: 54 | 200 |  290.302985ms |  172.30.229.176 | POST     "/api/generate"
msg="context for request finished"
msg="runner with non-zero duration has gone idle, adding timer" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-24b532e5276503b147d0eea0e47cb1d2bcce7c9034edd657b624261862ca54a1 duration=30m0s
msg="after processing request finished event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-24b532e5276503b147d0eea0e47cb1d2bcce7c9034edd657b624261862ca54a1 refCount=0
  • Ollama Logs (Using LiteLLM as OpenAI Provider)
    • The requests are being generated differently compared to the direct Ollama usage.
msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-24b532e5276503b147d0eea0e47cb1d2bcce7c9034edd657b624261862ca54a1
msg="generate request" images=0 prompt="<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\n### User:\n<fim_prefix>// Path: config.json\n// {\n//   \"models\": [\n//     {\n//       \"model\": \"test-model\",\n//       \"title\": \"word\",\n//       \"provider\": \"openai\",\n//       \"apiBase\": \"http://192.168.18.10:4000/\",\n//       \"apiKey\": \"\"\n//     }\n//   ],\n//   \"tabAutocompleteModel\": {\n//     \"model\": \"test-model-2\",\n//     \"title\": \"word\",\n//     \"provider\": \"openai\",\n//     \"apiBase\": \"http://192.168.18.10:4000/\",\n//     \"apiKey\": \"\"\n//   },\n//   \"embeddingsProvider\": {\n//     \"provider\": \"openai\",\n//     \"model\": \"snowflake-arctic-embed2\",\n//     \"apiBase\": \"http://192.168.18.10:4000\"\n//   },\n//   \"contextProviders\": [\n//     {\n//       \"name\": \"code\",\n//       \"params\": {}\n//     },\n//     {\n//       \"name\": \"docs\",\n//       \"params\": {}\n//     },\n//     {\n// Path: config.json\n// {\n//   \"models\": [\n//     {\n//       \"model\": \"test-model\",\n//       \"title\": \"word\",\n//       \"provider\": \"openai\",\n//       \"apiBase\": \"http://192.168.18.10:4000/\",\n//       \"apiKey\": \"\"\n//     }\n//   ],\n//   \"tabAutocompleteModel\": {\n//     \"model\": \"test-model-2\",\n//     \"title\": \"word\",\n//     \"provider\": \"openai\",\n//     \"apiBase\": \"http://192.168.18.10:4000/\",\n//     \"apiKey\": \"\"\n//   },\n//   \"embeddingsProvider\": {\n//     \"provider\": \"openai\",\n//     \"model\": \"snowflake-arctic-embed2\",\n//     \"apiBase\": \"http://192.168.18.10:4000\"\n//   },\n//   \"contextProviders\": [\n//     {\n//       \"name\": \"code\",\n//       \"params\": {}\n//     },\n//     {\n//       \"name\": \"docs\",\n//       \"params\": {}\n//     },\n//     {\n// Path: config.json\n// {\n//   \"models\": [\n//     {\n//       \"model\": \"test-model\",\n//       \"title\": \"word\",\n//       \"provider\": \"openai\",\n//       \"apiBase\": \"http://192.168.18.10:4000/\",\n//       \"apiKey\": \"\"\n//     }\n//   ],\n//   \"tabAutocompleteModel\": {\n//     \"model\": \"test-model-2\",\n//     \"title\": \"word\",\n//     \"provider\": \"openai\",\n//     \"apiBase\": \"http://192.168.18.10:4000/\",\n//     \"apiKey\": \"\"\n//   },\n//   \"embeddingsProvider\": {\n//     \"provider\": \"openai\",\n//     \"model\": \"snowflake-arctic-embed2\",\n//     \"apiBase\": \"http://192.168.18.10:4000\"\n//   },\n//   \"contextProviders\": [\n//     {\n//       \"name\": \"code\",\n//       \"params\": {}\n//     },\n//     {\n//       \"name\": \"docs\",\n//       \"params\": {}\n//     },\n//     {\n// Path: test/main.js\n// \n// }\n// Path: test/main.js\n// }\n// Path: config.json\n//     \"model\": \"test-model\",\n//     \"title\": \"word\",\n// test/main.js\n/*\nThis is the main function\n*/\nvoid async function main() {\n\tlet a = 10;\n\tlet b = 20;\n\ta += sum(a, b); // Use the exported sum function\n\tconsole.log('Result:', a);\n}()\n\n\n// write the sum function here\n\nfunction sum(a,b,c)\n{\n\n\t<fim_suffix>\n\n\n}<fim_middle>\n\n<|im_end|>\n<|im_start|>assistant\n"
msg="loading cache slot" id=0 cache=941 prompt=974 used=24 remaining=950
msg="hit stop token" pending=[```
] stop=```
[GIN
] 2025/03/08 - 16: 19: 35 | 200 |  2.418615769s |  192.168.18.10 | POST     "/api/generate"
msg="context for request finished"
msg="runner with non-zero duration has gone idle, adding timer" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-24b532e5276503b147d0eea0e47cb1d2bcce7c9034edd657b624261862ca54a1 duration=30m0s
msg="after processing request finished event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-24b532e5276503b147d0eea0e47cb1d2bcce7c9034edd657b624261862ca54a1 refCount=0
  • litellm Configuration

model_list:
  - model_name: "deepseek-r1:8b"
    litellm_params:
      model: "ollama_chat/deepseek-r1:8b"
      api_base: "http://192.168.28.110:11434"
  - model_name: "deepseek-r1:70b"
    litellm_params:
      model: "ollama_chat/deepseek-r1:70b"
      api_base: "http://192.168.28.110:11434"
  - model_name: "test-model"
    litellm_params:
      model: "ollama_chat/test-model"
      api_base: "http://192.168.28.110:11434"
  - model_name: "test-model-2"
    litellm_params:
      model: "ollama/test-model"
      api_base: "http://192.168.28.110:11434"
      drop_params: true
    model_info:
      mode: "completion"

Possible Cause

There may be differences in how Continue.dev interacts with the openai provider compared to ollama, or the request formatting might be incompatible.

mlibre avatar Mar 08 '25 13:03 mlibre

Don't you need a "/v1" at the end of your apiBase? e.g. "apiBase": "http://192.168.18.10:4000/v1",

Ollama receives the request. After further investigation, I think the issue is that continue is not sending the suffix parameter when the provider is OpenAI. Which causes ollama to respond poorly.

mlibre avatar Apr 09 '25 07:04 mlibre

related to https://github.com/BerriAI/litellm/issues/6900

mlibre avatar Apr 14 '25 11:04 mlibre

This is pretty important for our enterprise workflow. Is there a plan for handling this?

buildgreatthings avatar Jun 15 '25 04:06 buildgreatthings

Did someone manage to make litellm + continue.dev work for autocomplete?

grosjeang avatar Aug 25 '25 15:08 grosjeang

Did someone manage to make litellm + continue.dev work for autocomplete?

Yes! This is how to do it for codestral: https://github.com/BerriAI/litellm/issues/9251#issuecomment-3314863160

Squix avatar Sep 20 '25 13:09 Squix