Error generating autocompletion with Qwen2.5-Coder-7B and vllm
Before submitting your bug report
- [X] I believe this is a bug. I'll try to join the Continue Discord for questions
- [X] I'm not able to find an open issue that reports the same bug
- [X] I've seen the troubleshooting guide on the Continue Docs
Relevant environment info
- OS: Windows 11
- Continue: v0.9.211 (pre-release)
- IDE: VS-Code
- Model: `Qwen/Qwen2.5-Coder-7B` served with `vllm`
- config.json:
{
"models": [],
"tabAutocompleteModel": {
"title": "Qwen/Qwen2.5-Coder-7B",
"provider": "vllm",
"model": "Qwen/Qwen2.5-Coder-7B",
"apiBase": "http://192.168.1.19:8000/v1",
"apiKey": "None",
"completionOptions": {
"template": "<|fim_prefix|>{{{ prefix }}}<|fim_suffix|>{{{ suffix }}}<|fim_middle|>",
"stop": ["<|endoftext|>"]
}
}
}
Description
The tabAutoComplete feature is not displaying any suggestions in the VS Code editor.
- Continue TabAutoComplete is enabled
- VSCode: Inline Suggestion is enabled
To reproduce
- Ensure the vllm server is running. Confirmed by observing the log entry:
"GET /v1/models HTTP/1.1" 200 OKwhenever theconfig.jsonis modified. - Type in the VS Code editor to trigger auto-completion.
Expected Behavior
Auto-completion suggestions should appear in the VS Code editor.
Actual Behavior
vllm server received "POST /v1/completions HTTP/1.1" 200 OK but nothing show on VsCode Editor.
VsCode Console displayed Error generating autocompletion: TypeError: Cannot read properties of undefined (reading 'includes')
Additional Observations
After this error occurs, the Continue extension no longer sends POST /v1/completions requests to the vllm server.
Log output
[Extension Host] Error generating autocompletion: TypeError: Cannot read properties of undefined (reading 'includes')
at c:\Users\MyUser\.vscode\extensions\continue.continue-0.9.211-win32-x64\out\extension.js:102778:87
at Array.some (<anonymous>)
at _CompletionProvider.getTabCompletion (c:\Users\MyUser\.vscode\extensions\continue.continue-0.9.211-win32-x64\out\extension.js:102778:61)
at async _CompletionProvider.provideInlineCompletionItems (c:\Users\MyUser\.vscode\extensions\continue.continue-0.9.211-win32-x64\out\extension.js:102697:27)
at async ContinueCompletionProvider.provideInlineCompletionItems (c:\Users\MyUser\.vscode\extensions\continue.continue-0.9.211-win32-x64\out\extension.js:517910:27)
at async Y.provideInlineCompletions (c:\Users\MyUser\AppData\Local\Programs\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:161:123619)
Ran into this same issue.
Found a workaround to get some completions.
{
"models": [
{
"title": "Qwen2.5-Coder-7b-Instruct",
"provider": "vllm",
"model": "Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ",
"apiBase": "http://10.0.0.10:8000/v1"
}
],
"tabAutocompleteModel": {
"title": "Qwen2.5-Coder-7b-Instruct",
"provider": "openai",
"apiKey": "None",
"completionOptions": {
"stop": [
"<|endoftext|>",
"\n"
]
},
"apiBase": "http://10.0.0.10:8000/v1/",
"model": "Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ"
},
"tabAutocompleteOptions": {
"multilineCompletions": "never",
"template": "You are a helpful assistant.<|fim_prefix|>{{{ prefix }}}<|fim_suffix|>{{{ suffix }}}<|fim_middle|>"
},
"customCommands": [],
"allowAnonymousTelemetry": false,
"docs": []
}
Namely the tabAutocompleteOptions template, and the model provider being openai with the completion options stop including the two entries.
Switched from TGI to vLLM containers and ran into Error generating autocompletion: TypeError: Cannot read properties of undefined (reading 'includes') as @LNTH and @7216 did when using autocomplete. Since codegemma is a supported model I only had to change providers to openai. I also ran into Error streaming diff: TypeError: Cannot read properties of undefined (reading 'toLowerCase') when using Cmd/Ctrl + I to generate or fix code, though Cmd/Ctrl + L worked to chat with my code. Again changing providers to openai worked. So, there seems to be a minor problem with the vLLM provider in both autocomplete and code generation that doesn't appear to affect the openai implementation. My final config was:
"models": [
{
"title": "CodeGemma Chat",
"provider": "openai",
"model": "/models/codegemma-7b-it",
"apiBase": "http://ip_address/v1/"
}
],
"tabAutocompleteModel": {
"title": "CodeGemma Code Completion",
"provider": "openai",
"model": "/models/codegemma-7b",
"apiBase": "http://ip_address/v1/"
},
same issue
same issue
Hi all, thanks for the detailed write-ups and +1s. We've had some other problems with autocomplete not working for folks and are planning to focus on bugfixes shortly. Added this one to our list of issues.
same problem, any update on this?
still not working for me:
"tabAutocompleteModel": {
"model": "Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8",
"provider": "openai",
"apiKey": "None",
"title": "Qwen2.5-Coder-7B-AutoComplete",
"apiBase": "http://localhost:2242/v1",
"contextLength": 2048,
"completionOptions": {
"temperature": 0.01,
"stop": ["<|endoftext|>"]
}
},
"tabAutocompleteOptions": {
"useCache": true,
"multilineCompletions": "auto",
"maxPromptTokens": 2048,
"useFileSuffix": false,
"debounceDelay": 100,
"template": "<|fim_prefix|>{{{ prefix }}}<|fim_suffix|>{{{ suffix }}}<|fim_middle|>"
},
docker run --gpus all --runtime nvidia -v ~/.cache/huggingface:/root/.cache/huggingface -p 2242:2242 --ipc=host --name serve_autocomplete --restart always vllm/vllm-openai --model Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8 -tp 1 --host 0.0.0.0 --port 2242 --gpu-memory-utilization 1 --max-model-len 2048 --max-num-batched-tokens 2048
Try with: "completionOptions": {"maxTokens": 2048}
I have similar kind of issue, No output generate.
This issue hasn't been updated in 90 days and will be closed after an additional 10 days without activity. If it's still important, please leave a comment and share any new information that would help us address the issue.
This issue was closed because it wasn't updated for 10 days after being marked stale. If it's still important, please reopen + comment and we'll gladly take another look!