Mark McLoughlin comments

Results 63 comments of


                                            Mark McLoughlin

[V1] Refactor parallel sampling support

Lint and deploy minio setup failing with: ``` make_bucket failed: s3://testbucket Could not connect to the endpoint URL: "http://127.0.0.1:9000/testbucket" ```

[V1][Metrics] Implement vllm:lora_requests_info metric

> Generally looks good. Left a small nit that will help clean up the API a bit `EngineCoreRequest` already contains `lora_request` so we do not need to pass it around...

[V1][Metrics] Implement vllm:lora_requests_info metric

> LGTM, I like the abstraction and lifecycle. Just one nit on the typing. Ping when ready for automerge. Thanks! Fixed > Do you know if we have any test...

[WIP][Metrics] Re-work approach to LoRA metrics

You could argue either: 1) We don't need per-adapter counts at all, just an info metric (like cache_config_info) that lists the configured adapters, or 2) Most of our metrics should...

[WIP][Metrics] Re-work approach to LoRA metrics

ok, I took a closer look at what the [Gateway API Inference Extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension) is doing with this metric. I've filed kubernetes-sigs/gateway-api-inference-extension#354 to invite feedback from that project. The premise is...