Why are we using olympic scores stdev for RCP interpolation?
The general idea of stdev is to get the variance of a population, but Olympic scoring drops the extreme values. So we should consider using stdev of the entire population for interpolation.
Need to analyze and fix for v4.1
Can we see how this affect the v4.0 scores?
@ShriyaPalsamudram I am getting the same results when with and without the olympic stdev. It seems to be, because it the RCP Stdev is only being used to compute the min_epochs
https://github.com/mlcommons/logging/blob/369260bf8326f36f644d34a1996b05ec51ad9717/mlperf_logging/rcp_checker/rcp_checker.py#L272-L275
And since we are no longer pruning based on min_epochs, it doesn't seem to have an effect on the results. The min_epochs later affects the Max Speedup, but this only later used in a condition to check if the RCP passed.
https://github.com/mlcommons/logging/blob/369260bf8326f36f644d34a1996b05ec51ad9717/mlperf_logging/rcp_checker/rcp_checker.py#L276
https://github.com/mlcommons/logging/blob/369260bf8326f36f644d34a1996b05ec51ad9717/mlperf_logging/rcp_checker/rcp_checker.py#L438
@ShriyaPalsamudram What changes were expected when changing the Stdev?
Since this impacts max speedup, can we compare max speedup before and after the change for all RCP points?
Additionally, an example of the max_speed_up values for last training results: HPE-Cray-XD670-Gen11-H100-SXM5-80GB_n1_mxnet_24.04 With olympic score:
[1.018748075108887, 1.0601459916687128]
Without olympic score:
[1.0262860923600325, 1.0699036853548622]