Mark McLoughlin comments

Results 63 comments of


                                            Mark McLoughlin

[Feature]: Enhance integration with advanced LB/gateways with better load/cost reporting and LoRA management

Given the new PR, unmarking this as stale

[V1] AsyncLLM data parallel

> FYI @markmc I changed the metrics logic now in this PR to not attempt to aggregate from multiple engines and just log them separately. For prometheus I added a...

[V1][Metrics] Add model_load_time as a log for CUDA devices

I figured I'd review these metrics one by one, starting with **model_load_time**. (Incidentally, doing each of these in a separate PR might mean they get merged more quickly) A high-level...

[V1][Metrics] Add model_load_time as a log for CUDA devices

> Provided as a log for CUDA devices Uh? How can an auto-scaler use a log message to "determine the pod autoscaling threshold and frequency" ?

[V1][Metrics] Add model_load_time as a log for CUDA devices

It's not obvious to me that we need to calculate the sum of a number of individual timings in order to log a useful startup time timing Why not something...

Export NaNs in logits to scheduler_stats if output is corrupted

It doesn't seem like we actually need an exact count of nans - we just want a signal that corruption is spiking? What happens the request when this happens? Is...

Export NaNs in logits to scheduler_stats if output is corrupted

Some overlap with #18765 ... except I'm not sure this NaN case results in the request explicitly failing?

[V1] Add request-level, per-step acceptance counts tracking for spec dec.

> I did not take into account Prometheus since I'm not too confident about the best design practice. I'm also hoping to get this PR merged soon so that people...

[V1] Add request-level, per-step acceptance counts tracking for spec dec.

Oh, I also meant to say ... we should be a bit cautious about getting new APIs right because they're difficult/disruptive to remove later. Temporarily printing metrics as debug logging...

[V1] Add request-level, per-step acceptance counts tracking for spec dec.

> Meanwhile, I kinda disagree printing metrics is the most helpful way for debugging Yep, I just mentioned it as a potential stop-gap solution if metrics take e.g. a week...