continue icon indicating copy to clipboard operation
continue copied to clipboard

Error generating autocompletion with Qwen2.5-Coder-7B and vllm

Open LNTH opened this issue 1 year ago • 4 comments

Before submitting your bug report

Relevant environment info

- OS: Windows 11
- Continue: v0.9.211 (pre-release)
- IDE: VS-Code
- Model: `Qwen/Qwen2.5-Coder-7B` served with `vllm`
- config.json:
  
  {
    "models": [],
    "tabAutocompleteModel": {
      "title": "Qwen/Qwen2.5-Coder-7B",
      "provider": "vllm",
      "model": "Qwen/Qwen2.5-Coder-7B",
      "apiBase": "http://192.168.1.19:8000/v1",
      "apiKey": "None",
      "completionOptions": {
        "template": "<|fim_prefix|>{{{ prefix }}}<|fim_suffix|>{{{ suffix }}}<|fim_middle|>",
        "stop": ["<|endoftext|>"]
      }
    }
  }

Description

The tabAutoComplete feature is not displaying any suggestions in the VS Code editor.

  • Continue TabAutoComplete is enabled
  • VSCode: Inline Suggestion is enabled

To reproduce

  1. Ensure the vllm server is running. Confirmed by observing the log entry: "GET /v1/models HTTP/1.1" 200 OK whenever the config.json is modified.
  2. Type in the VS Code editor to trigger auto-completion.

Expected Behavior

Auto-completion suggestions should appear in the VS Code editor.

Actual Behavior

vllm server received "POST /v1/completions HTTP/1.1" 200 OK but nothing show on VsCode Editor. VsCode Console displayed Error generating autocompletion: TypeError: Cannot read properties of undefined (reading 'includes')

Additional Observations

After this error occurs, the Continue extension no longer sends POST /v1/completions requests to the vllm server.

Log output

[Extension Host] Error generating autocompletion: TypeError: Cannot read properties of undefined (reading 'includes')
    at c:\Users\MyUser\.vscode\extensions\continue.continue-0.9.211-win32-x64\out\extension.js:102778:87
    at Array.some (<anonymous>)
    at _CompletionProvider.getTabCompletion (c:\Users\MyUser\.vscode\extensions\continue.continue-0.9.211-win32-x64\out\extension.js:102778:61)
    at async _CompletionProvider.provideInlineCompletionItems (c:\Users\MyUser\.vscode\extensions\continue.continue-0.9.211-win32-x64\out\extension.js:102697:27)
    at async ContinueCompletionProvider.provideInlineCompletionItems (c:\Users\MyUser\.vscode\extensions\continue.continue-0.9.211-win32-x64\out\extension.js:517910:27)
    at async Y.provideInlineCompletions (c:\Users\MyUser\AppData\Local\Programs\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:161:123619)

LNTH avatar Sep 25 '24 06:09 LNTH

Ran into this same issue.

7216 avatar Sep 25 '24 06:09 7216

Found a workaround to get some completions.

{
  "models": [
    {
      "title": "Qwen2.5-Coder-7b-Instruct",
      "provider": "vllm",
      "model": "Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ",
      "apiBase": "http://10.0.0.10:8000/v1"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen2.5-Coder-7b-Instruct",
    "provider": "openai",
    "apiKey": "None",
    "completionOptions": {
      "stop": [
        "<|endoftext|>",
        "\n"
      ]
    },
    "apiBase": "http://10.0.0.10:8000/v1/",
    "model": "Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ"
  },
  "tabAutocompleteOptions": {
    "multilineCompletions": "never",
    "template": "You are a helpful assistant.<|fim_prefix|>{{{ prefix }}}<|fim_suffix|>{{{ suffix }}}<|fim_middle|>"
  },
  "customCommands": [],
  "allowAnonymousTelemetry": false,
  "docs": []
}

Namely the tabAutocompleteOptions template, and the model provider being openai with the completion options stop including the two entries.

7216 avatar Sep 25 '24 06:09 7216

Switched from TGI to vLLM containers and ran into Error generating autocompletion: TypeError: Cannot read properties of undefined (reading 'includes') as @LNTH and @7216 did when using autocomplete. Since codegemma is a supported model I only had to change providers to openai. I also ran into Error streaming diff: TypeError: Cannot read properties of undefined (reading 'toLowerCase') when using Cmd/Ctrl + I to generate or fix code, though Cmd/Ctrl + L worked to chat with my code. Again changing providers to openai worked. So, there seems to be a minor problem with the vLLM provider in both autocomplete and code generation that doesn't appear to affect the openai implementation. My final config was:

"models": [
    {
      "title": "CodeGemma Chat",
      "provider": "openai",
      "model": "/models/codegemma-7b-it",
      "apiBase": "http://ip_address/v1/"
    }
  ],
  "tabAutocompleteModel": {
    "title": "CodeGemma Code Completion",
    "provider": "openai",
    "model": "/models/codegemma-7b",
    "apiBase": "http://ip_address/v1/"
  },

CMobley7 avatar Oct 01 '24 19:10 CMobley7

same issue

wnanbei avatar Oct 09 '24 05:10 wnanbei

same issue

ishotoli avatar Oct 10 '24 16:10 ishotoli

Hi all, thanks for the detailed write-ups and +1s. We've had some other problems with autocomplete not working for folks and are planning to focus on bugfixes shortly. Added this one to our list of issues.

Patrick-Erichsen avatar Oct 14 '24 18:10 Patrick-Erichsen

same problem, any update on this?

mapledxf avatar Nov 13 '24 02:11 mapledxf

still not working for me:

  "tabAutocompleteModel": {
    "model": "Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8",
    "provider": "openai",
    "apiKey": "None",
    "title": "Qwen2.5-Coder-7B-AutoComplete",
    "apiBase": "http://localhost:2242/v1",
    "contextLength": 2048,
    "completionOptions": {
      "temperature": 0.01,
      "stop": ["<|endoftext|>"]
    }
  },
  "tabAutocompleteOptions": {
    "useCache": true,
    "multilineCompletions": "auto",
    "maxPromptTokens": 2048,
    "useFileSuffix": false,
    "debounceDelay": 100,
    "template": "<|fim_prefix|>{{{ prefix }}}<|fim_suffix|>{{{ suffix }}}<|fim_middle|>"
  },
  
docker run --gpus all --runtime nvidia -v ~/.cache/huggingface:/root/.cache/huggingface -p 2242:2242 --ipc=host --name serve_autocomplete --restart always vllm/vllm-openai --model Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8 -tp 1 --host 0.0.0.0 --port 2242 --gpu-memory-utilization 1 --max-model-len 2048 --max-num-batched-tokens 2048

assafmo avatar Nov 21 '24 18:11 assafmo

Try with: "completionOptions": {"maxTokens": 2048}

TungstenWolframite avatar Feb 22 '25 17:02 TungstenWolframite

I have similar kind of issue, No output generate.

BhautikChudasama avatar Apr 18 '25 06:04 BhautikChudasama

This issue hasn't been updated in 90 days and will be closed after an additional 10 days without activity. If it's still important, please leave a comment and share any new information that would help us address the issue.

github-actions[bot] avatar Aug 06 '25 02:08 github-actions[bot]

This issue was closed because it wasn't updated for 10 days after being marked stale. If it's still important, please reopen + comment and we'll gladly take another look!

github-actions[bot] avatar Aug 17 '25 02:08 github-actions[bot]