ckstanton

Results 7 comments of ckstanton

Hi @nv-jinhosuh . Thanks for catching this. It turns out that the fix is simple... there's no complicated overflowing issue going on. There's currently an [overlatency query bound](https://github.com/mlcommons/inference/blob/r2.0/loadgen/loadgen.cc#L635) that's set...

> That looks like the same bug to me. Agreed. There should be on the order of ~4500 acceptable overlatency queries for this run, which is bigger than the threshold...

> @nv-jinhosuh That's my feeling too. But IIRC we decided that the early estimate would be used as the metric regardless. And it is what the submission checker appears to...

> Currently we set d (tolerance) to zero in our LoadGen. With this I don't think we can have 'third case'. Is it possible if it's the second case, Early...

The effective required min_query_count depends on the underlying overlatency percentile of the system. Based on the observed overlatency percentile of a run, it would be possible to have loadgen estimate...

> Thank you @ckstanton - I think we cannot do anything about this for 2.0, but for 2.1, I think we need to bring the policy requirements of query count...

Thanks, @nv-jinhosuh, for putting this together! The proposal to use early stopping estimates for shorter runs, and to not use them (i.e. report seen percentiles instead) beyond the minimum duration...