[Feature]: Split deployment identifier from model deployment and add support for custom port for model deployment
🚀 Feature Description and Motivation
I deploy the R1 deployment service with mindie, then I create the HTTPRoute and service for the deployment. The label value model.aibrix.ai/name: deepseek-r1-w02mfd is not same to the start command R1-int8
gateway_req_body.go:60] "model doesn't exist in cache, probably wrong model name" requestID="1832b9c9-1772-42ac-bc4c-cd5572d15097" model="R1-int8".
I think it should be possible to access through the model in the header along with the name in the model, rather than just accessing through the gateway plugin based on the name in the model
Use Case
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
creationTimestamp: "2025-05-13T06:36:02Z"
generation: 2
name: deepseek-r1-w02mfd-router
namespace: aibrix-system
resourceVersion: "25629649"
uid: 0572e0d4-8706-45de-b944-4dc927bc7dac
spec:
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: aibrix-eg
namespace: aibrix-system
rules:
- backendRefs:
- group: ""
kind: Service
name: deepseek-r1-w02mfd
namespace: prdsafe
port: 1025
weight: 1
matches:
- headers:
- name: model
type: Exact
value: deepseek-r1-w02mfd
path:
type: PathPrefix
value: /v1/completions
- headers:
- name: model
type: Exact
value: deepseek-r1-w02mfd
path:
type: PathPrefix
value: /v1/chat/completions
timeouts:
request: 120s
status:
parents:
- conditions:
- lastTransitionTime: "2025-05-13T07:08:05Z"
message: Route is accepted
observedGeneration: 2
reason: Accepted
status: "True"
type: Accepted
- lastTransitionTime: "2025-05-13T07:08:05Z"
message: Resolved all the Object references for the Route
observedGeneration: 2
reason: ResolvedRefs
status: "True"
type: ResolvedRefs
controllerName: gateway.envoyproxy.io/gatewayclass-controller
parentRef:
group: gateway.networking.k8s.io
kind: Gateway
name: aibrix-eg
namespace: aibrix-system
Header: model: deepseek-r1-w02mfd
Body:
{
"model": "R1-int8",
"max_tokens":4096,
"temperature":0.6,
"stream": false,
"messages": [
{
"role": "user",
"content": "深入分析调度器的过滤和打分机制,越详细越好"
}
]
}
Proposed Solution
No response
I change the httproute as follow, the target pod port is fixed by https://github.com/vllm-project/aibrix/blob/main/pkg/types/router_context.go#L96.
I think it should firstly judge the pod label model.aibrix.ai/port, if not set the default port.Otherwise set the model.aibrix.ai/port for the deployment is meaningless for the target pod @Jeffwan
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
creationTimestamp: "2025-05-14T02:11:13Z"
generation: 1
name: deepseek-r1-w02mfd
namespace: aibrix-system
resourceVersion: "25983451"
uid: 75276d25-bdfc-4ef6-b649-689e3995983b
spec:
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: aibrix-eg
namespace: aibrix-system
rules:
- backendRefs:
- group: ""
kind: Service
name: deepseek-r1-w02mfd
namespace: prdsafe
port: 1025
weight: 1
matches:
- headers:
- name: model
type: Exact
value: R1-int8
path:
type: PathPrefix
value: /v1/completions
- headers:
- name: model
type: Exact
value: R1-int8
path:
type: PathPrefix
value: /v1/chat/completions
timeouts:
request: 120s
- Goal is to split the deployment identifier and model-name. It is good feature to have but requires careful design consideration as it may create confusion for novice users.
- For custom port support, will prioritize it.
@varungup90 what's the status of this issue?
this is related to multi-tenancy, let's move from v0.4.0 milestone