Mark McLoughlin issues

Results 15 issues of


                                            Mark McLoughlin

Make listener port optional

There are multiple cases where it may make sense to not require a gateway owner to specify a listener port: 1. Where a port number is irrelevant or nonsensical -...

kind/feature

lifecycle/rotten

L4 Gateway Sharing

The Gateway vs Route separation allows a cluster operator gateway owner to share a gateway and its resources with many application developers. However, in the L4 routing model (i.e. TCPRoute...

kind/feature

Include port in listener status

We should consider adding the port number to listener status in anticipation of future support for auto-assigning a port when no port is specified. By adding it in advance of...

kind/feature

lifecycle/stale

[v1][Metrics] Add design doc

Related to #10582. Some notes I had taken on v0 metrics implementation, along with v1 design details.

documentation

ready

[WIP][Metrics] Re-work approach to LoRA metrics

Part of #10582 and discussed in #12745 The current `vllm:lora_requests_info` Gauge is somewhat similar to an Info metric (like cache_config_info) except the value is the current wall-clock time, and is...

[V1][Metrics] Handle preemptions

Part of #10582 Add a core engine `PREEMPTED` event. Add the `num_preemptions_total` counter from v0. Also, make preemptions reset the scheduled and first token timestamps resulting in: ``` > [...

[Metrics] Add `--show-hidden-metrics-for-version` CLI arg

Part of #10582 and discussed in #12745 Add some infrastructure to help us deprecate and remove metrics in a less user-hostile way. Our deprecation process will now be: 1) Deprecate...

documentation

[V1][Metrics] Support `vllm:cache_config_info`

Part of #10582 prometheus_client has support for Info metrics which are equivalent to a Gauge whose value is permanently set to 1, but exposes interesting key/value pair information via labels....

[V1][Metrics] Implement vllm:lora_requests_info metric

``` vllm:lora_requests_info{max_lora="1",running_lora_adapters="",waiting_lora_adapters=""} 1.7395575657589855e+09 vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters=""} 1.7395575723949368e+09 vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters="test-lora"} 1.7395575717647147e+09 ``` As discussed in #13303, this metric perhaps isn't the most ideal solution for the use case but, given there is an existing...

[V1] Refactor parallel sampling support

The initial implementation in #10980 went to great efforts to add parallel sampling as a wrapper at the highest layer of abstraction possible. This resulted in a lot of tricky...

ready