Mark McLoughlin

Results 15 issues of Mark McLoughlin

There are multiple cases where it may make sense to not require a gateway owner to specify a listener port: 1. Where a port number is irrelevant or nonsensical -...

kind/feature
lifecycle/rotten

The Gateway vs Route separation allows a cluster operator gateway owner to share a gateway and its resources with many application developers. However, in the L4 routing model (i.e. TCPRoute...

kind/feature

We should consider adding the port number to listener status in anticipation of future support for auto-assigning a port when no port is specified. By adding it in advance of...

kind/feature
lifecycle/stale

Related to #10582. Some notes I had taken on v0 metrics implementation, along with v1 design details.

documentation
ready
v1

Part of #10582 and discussed in #12745 The current `vllm:lora_requests_info` Gauge is somewhat similar to an Info metric (like cache_config_info) except the value is the current wall-clock time, and is...

Part of #10582 Add a core engine `PREEMPTED` event. Add the `num_preemptions_total` counter from v0. Also, make preemptions reset the scheduled and first token timestamps resulting in: ``` > [...

v1

Part of #10582 and discussed in #12745 Add some infrastructure to help us deprecate and remove metrics in a less user-hostile way. Our deprecation process will now be: 1) Deprecate...

documentation
v1

Part of #10582 prometheus_client has support for Info metrics which are equivalent to a Gauge whose value is permanently set to 1, but exposes interesting key/value pair information via labels....

v1

``` vllm:lora_requests_info{max_lora="1",running_lora_adapters="",waiting_lora_adapters=""} 1.7395575657589855e+09 vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters=""} 1.7395575723949368e+09 vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters="test-lora"} 1.7395575717647147e+09 ``` As discussed in #13303, this metric perhaps isn't the most ideal solution for the use case but, given there is an existing...

v1

The initial implementation in #10980 went to great efforts to add parallel sampling as a wrapper at the highest layer of abstraction possible. This resulted in a lot of tricky...

ready
v1