Mark McLoughlin

Results 63 comments of Mark McLoughlin

Lint and deploy minio setup failing with: ``` make_bucket failed: s3://testbucket Could not connect to the endpoint URL: "http://127.0.0.1:9000/testbucket" ```

> Generally looks good. Left a small nit that will help clean up the API a bit `EngineCoreRequest` already contains `lora_request` so we do not need to pass it around...

> LGTM, I like the abstraction and lifecycle. Just one nit on the typing. Ping when ready for automerge. Thanks! Fixed > Do you know if we have any test...

You could argue either: 1) We don't need per-adapter counts at all, just an info metric (like cache_config_info) that lists the configured adapters, or 2) Most of our metrics should...

ok, I took a closer look at what the [Gateway API Inference Extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension) is doing with this metric. I've filed kubernetes-sigs/gateway-api-inference-extension#354 to invite feedback from that project. The premise is...

Given the way `LRUCacheWorkerLoRAManager` works, the current V0 metric implementation and what's proposed here in V1 both miss an important point - even if there was no requests for a...

Implemented the V0 metric in V1 in #13504

> I've filed [kubernetes-sigs/gateway-api-inference-extension#354](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/354) to invite feedback from that project. Deferring for now, based on the feedback from above

> @markmc Can you please provide an example of how to compute acceptance length from the retrieved metrics in `examples/offline_inference/eagle.py`? is it just > > ``` > acceptance_length = 1...