Selfhosted vLLM Server (Qwen2.5-VL-32B-Instruct)

Open plitc opened this issue 4 months ago • 1 comments

@atupem Since Ollama doesn’t support “supports_function_calling,” we’ve switched to vLLM. However, our current parameters/configuration don’t work with ByteBot. Could you help us?

vLLM Server (Docker / Config)

Proxmox VM with 4x NVIDIA RTX 6000A

#!/bin/sh export HUGGING_FACE_HUB_TOKEN=hf_XXX-XXX-XXX export CUDA_VISIBLE_DEVICES="0,1,2,3" docker run \ --name vllm-qwen-vl \ --network vllm-qwen-vl \ --gpus all \ --runtime=nvidia \ --ipc=host \ --rm --init \ -p 8000:8000 \ -v /opt/vllm:/root/.cache/huggingface \ vllm/vllm-openai:latest --model Qwen/Qwen2.5-VL-32B-Instruct --served-model-name "Qwen2.5-VL-32B-Instruct" --tensor-parallel-size 4 --max_model_len 32768 --enable-auto-tool-choice --tool-call-parser hermes --chat-template-content-format openai --chat-template /root/.cache/huggingface/chat_template.json #

ByteBot (Config)

root@bytebot-1:/opt/bytebot# egrep -A 6 -B 2 "Qwen2.5-VL" packages/bytebot-llm-proxy/litellm-config.yaml model_list: - model_name: VM426:Qwen2.5-VL-32B-Instruct litellm_params: model: openai/Qwen2.5-VL-32B-Instruct api_base: https://XXX-XXX-XXX-XXX/v1 supports_function_calling: true drop_params: true - model_name: VM426:OpenGVLab/InternVL3_5-38B litellm_params: ... root@bytebot-1:/opt/bytebot#

Errors (vLLM)

(APIServer pid=7) WARNING 09-12 16:12:18 [protocol.py:81] The following fields were present in the request but ignored: {'supports_function_calling'} (APIServer pid=7) WARNING 09-12 16:12:18 [sampling_params.py:311] temperature 1e-06 is less than 0.01, which may cause numerical errors nan or inf in tensors. We have maxed it out to 0.01. (APIServer pid=7) INFO: 172.21.0.1:37970 - "POST /v1/chat/completions HTTP/1.1" 200 OK (APIServer pid=7) WARNING 09-12 16:12:19 [protocol.py:81] The following fields were present in the request but ignored: {'supports_function_calling'} (APIServer pid=7) WARNING 09-12 16:12:19 [sampling_params.py:311] temperature 1e-06 is less than 0.01, which may cause numerical errors nan or inf in tensors. We have maxed it out to 0.01. (APIServer pid=7) INFO: 172.21.0.1:37970 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Errors (ByteBot)

[Nest] 18 - 09/12/2025, 10:26:20 PM LOG [TasksService] Found existing task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509, and status PENDING. Resuming. [Nest] 18 - 09/12/2025, 10:26:20 PM LOG [TasksService] Updating task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509 [Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [TasksService] Update data: {"status":"RUNNING","executedAt":"2025-09-12T22:26:20.005Z"} [Nest] 18 - 09/12/2025, 10:26:20 PM LOG [TasksService] Retrieving task by ID: be692d7e-d4cd-42b8-99ca-6057beacd509 [Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [TasksService] Retrieved task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509 [Nest] 18 - 09/12/2025, 10:26:20 PM LOG [TasksService] Successfully updated task ID: be692d7e-d4cd-42b8-99ca-6057beacd509 [Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [TasksService] Updated task: {"id":"be692d7e-d4cd-42b8-99ca-6057beacd509","description":"Open Firefox Browser","type":"IMMEDIATE","status":"RUNNING","priority":"MEDIUM","control":"ASSISTANT","createdAt":"2025-09-12T22:26:16.493Z","createdBy":"USER","scheduledFor":null,"updatedAt":"2025-09-12T22:26:20.008Z","executedAt":"2025-09-12T22:26:20.005Z","completedAt":null,"queuedAt":null,"error":null,"result":null,"model":{"name":"openai/Qwen2.5-VL-32B-Instruct","title":"VM426:Qwen2.5-VL-32B-Instruct","provider":"proxy","contextWindow":128000}} [Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [AgentScheduler] Processing task ID: be692d7e-d4cd-42b8-99ca-6057beacd509 [Nest] 18 - 09/12/2025, 10:26:20 PM LOG [AgentProcessor] Starting processing for task ID: be692d7e-d4cd-42b8-99ca-6057beacd509 [Nest] 18 - 09/12/2025, 10:26:20 PM LOG [TasksService] Retrieving task by ID: be692d7e-d4cd-42b8-99ca-6057beacd509 [Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [TasksService] Retrieved task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509 [Nest] 18 - 09/12/2025, 10:26:20 PM LOG [AgentProcessor] Processing iteration for task ID: be692d7e-d4cd-42b8-99ca-6057beacd509 [Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [AgentProcessor] Sending 1 messages to LLM for processing [Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [AgentProcessor] Received 0 content blocks from LLM [Nest] 18 - 09/12/2025, 10:26:20 PM WARN [AgentProcessor] Task ID: be692d7e-d4cd-42b8-99ca-6057beacd509 received no content blocks from LLM, marking as failed [Nest] 18 - 09/12/2025, 10:26:20 PM LOG [TasksService] Updating task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509 [Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [TasksService] Update data: {"status":"FAILED"} [Nest] 18 - 09/12/2025, 10:26:20 PM LOG [TasksService] Retrieving task by ID: be692d7e-d4cd-42b8-99ca-6057beacd509 [Nest] 18 - 09/12/2025, 10:26:20 PM DEBUG [TasksService] Retrieved task with ID: be692d7e-d4cd-42b8-99ca-6057beacd509

Sep 13 '25 05:09 plitc

I was able to connect bytebot to that model using llm studio. I was also originally using ollama but switch to llm studio and its amazing. However, even though it works without any api exceptions it has trouble using firefox and I'm creating a separate issue for that.

Sep 13 '25 09:09 radiantone