aibrix [Feature]: Split deployment identifier from model deployment and add support for custom port for model deployment

🚀 Feature Description and Motivation

I deploy the R1 deployment service with mindie, then I create the HTTPRoute and service for the deployment. The label value model.aibrix.ai/name: deepseek-r1-w02mfd is not same to the start command R1-int8 gateway_req_body.go:60] "model doesn't exist in cache, probably wrong model name" requestID="1832b9c9-1772-42ac-bc4c-cd5572d15097" model="R1-int8". I think it should be possible to access through the model in the header along with the name in the model, rather than just accessing through the gateway plugin based on the name in the model

Use Case

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  creationTimestamp: "2025-05-13T06:36:02Z"
  generation: 2
  name: deepseek-r1-w02mfd-router
  namespace: aibrix-system
  resourceVersion: "25629649"
  uid: 0572e0d4-8706-45de-b944-4dc927bc7dac
spec:
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: aibrix-eg
    namespace: aibrix-system
  rules:
  - backendRefs:
    - group: ""
      kind: Service
      name: deepseek-r1-w02mfd
      namespace: prdsafe
      port: 1025
      weight: 1
    matches:
    - headers:
      - name: model
        type: Exact
        value: deepseek-r1-w02mfd
      path:
        type: PathPrefix
        value: /v1/completions
    - headers:
      - name: model
        type: Exact
        value: deepseek-r1-w02mfd
      path:
        type: PathPrefix
        value: /v1/chat/completions
    timeouts:
      request: 120s
status:
  parents:
  - conditions:
    - lastTransitionTime: "2025-05-13T07:08:05Z"
      message: Route is accepted
      observedGeneration: 2
      reason: Accepted
      status: "True"
      type: Accepted
    - lastTransitionTime: "2025-05-13T07:08:05Z"
      message: Resolved all the Object references for the Route
      observedGeneration: 2
      reason: ResolvedRefs
      status: "True"
      type: ResolvedRefs
    controllerName: gateway.envoyproxy.io/gatewayclass-controller
    parentRef:
      group: gateway.networking.k8s.io
      kind: Gateway
      name: aibrix-eg
      namespace: aibrix-system

Header: model: deepseek-r1-w02mfd
Body:
{
    "model": "R1-int8",
    "max_tokens":4096,
    "temperature":0.6,
    "stream": false,
    "messages": [
        {
            "role": "user",
            "content": "深入分析调度器的过滤和打分机制，越详细越好"
        }
    ]
}

Proposed Solution

No response

May 13 '25 08:05 ying2025

I change the httproute as follow, the target pod port is fixed by https://github.com/vllm-project/aibrix/blob/main/pkg/types/router_context.go#L96. I think it should firstly judge the pod label model.aibrix.ai/port, if not set the default port.Otherwise set the model.aibrix.ai/port for the deployment is meaningless for the target pod @Jeffwan

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  creationTimestamp: "2025-05-14T02:11:13Z"
  generation: 1
  name: deepseek-r1-w02mfd
  namespace: aibrix-system
  resourceVersion: "25983451"
  uid: 75276d25-bdfc-4ef6-b649-689e3995983b
spec:
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: aibrix-eg
    namespace: aibrix-system
  rules:
  - backendRefs:
    - group: ""
      kind: Service
      name: deepseek-r1-w02mfd
      namespace: prdsafe
      port: 1025
      weight: 1
    matches:
    - headers:
      - name: model
        type: Exact
        value: R1-int8
      path:
        type: PathPrefix
        value: /v1/completions
    - headers:
      - name: model
        type: Exact
        value: R1-int8
      path:
        type: PathPrefix
        value: /v1/chat/completions
    timeouts:
      request: 120s

May 14 '25 03:05 ying2025

Goal is to split the deployment identifier and model-name. It is good feature to have but requires careful design consideration as it may create confusion for novice users.
For custom port support, will prioritize it.

May 14 '25 05:05 varungup90

@varungup90 what's the status of this issue?

Jul 25 '25 18:07 Jeffwan

this is related to multi-tenancy, let's move from v0.4.0 milestone

Aug 01 '25 17:08 Jeffwan