Mark McLoughlin
Mark McLoughlin
Given the new PR, unmarking this as stale
> FYI @markmc I changed the metrics logic now in this PR to not attempt to aggregate from multiple engines and just log them separately. For prometheus I added a...
I figured I'd review these metrics one by one, starting with **model_load_time**. (Incidentally, doing each of these in a separate PR might mean they get merged more quickly) A high-level...
> Provided as a log for CUDA devices Uh? How can an auto-scaler use a log message to "determine the pod autoscaling threshold and frequency" ?
It's not obvious to me that we need to calculate the sum of a number of individual timings in order to log a useful startup time timing Why not something...
It doesn't seem like we actually need an exact count of nans - we just want a signal that corruption is spiking? What happens the request when this happens? Is...
Some overlap with #18765 ... except I'm not sure this NaN case results in the request explicitly failing?
> I did not take into account Prometheus since I'm not too confident about the best design practice. I'm also hoping to get this PR merged soon so that people...
Oh, I also meant to say ... we should be a bit cautious about getting new APIs right because they're difficult/disruptive to remove later. Temporarily printing metrics as debug logging...
> Meanwhile, I kinda disagree printing metrics is the most helpful way for debugging Yep, I just mentioned it as a potential stop-gap solution if metrics take e.g. a week...