Mark McLoughlin

Results 63 comments of Mark McLoughlin

See #16665 for logging mean acceptance length and per-position num_accepted_tokens e.g. ``` INFO 04-15 10:05:05 [metrics.py:82] SpecDecoding metrics: Draft acceptance rate: 48.1%, Mean acceptance length: 2.40, Accepted: 3323 tokens, Drafted:...

@WoosukKwon @luyuzhe111 take a look at #17010 for how we can use the aggregated metrics from Prometheus for offline inferencing too

> > take a look at #17010 for how we can use the aggregated metrics from Prometheus for offline inferencing too > > @markmc it's great that we can use...