logging icon indicating copy to clipboard operation
logging copied to clipboard

Why are we using olympic scores stdev for RCP interpolation?

Open ShriyaRishab opened this issue 1 year ago • 3 comments

The general idea of stdev is to get the variance of a population, but Olympic scoring drops the extreme values. So we should consider using stdev of the entire population for interpolation.

Need to analyze and fix for v4.1

ShriyaRishab avatar Jul 11 '24 19:07 ShriyaRishab

Stdev_olympic_prunning

pgmpablo157321 avatar Jul 31 '24 20:07 pgmpablo157321

RCPs_pruned_varying_Stdev

pgmpablo157321 avatar Jul 31 '24 20:07 pgmpablo157321

Can we see how this affect the v4.0 scores?

hiwotadese avatar Aug 01 '24 15:08 hiwotadese

@ShriyaPalsamudram I am getting the same results when with and without the olympic stdev. It seems to be, because it the RCP Stdev is only being used to compute the min_epochs

https://github.com/mlcommons/logging/blob/369260bf8326f36f644d34a1996b05ec51ad9717/mlperf_logging/rcp_checker/rcp_checker.py#L272-L275

And since we are no longer pruning based on min_epochs, it doesn't seem to have an effect on the results. The min_epochs later affects the Max Speedup, but this only later used in a condition to check if the RCP passed.

https://github.com/mlcommons/logging/blob/369260bf8326f36f644d34a1996b05ec51ad9717/mlperf_logging/rcp_checker/rcp_checker.py#L276

https://github.com/mlcommons/logging/blob/369260bf8326f36f644d34a1996b05ec51ad9717/mlperf_logging/rcp_checker/rcp_checker.py#L438

@ShriyaPalsamudram What changes were expected when changing the Stdev?

pgmpablo157321 avatar Aug 02 '24 21:08 pgmpablo157321

Since this impacts max speedup, can we compare max speedup before and after the change for all RCP points?

ShriyaRishab avatar Aug 08 '24 15:08 ShriyaRishab

RCPs_MaxSpeedUP

pgmpablo157321 avatar Aug 14 '24 15:08 pgmpablo157321

Additionally, an example of the max_speed_up values for last training results: HPE-Cray-XD670-Gen11-H100-SXM5-80GB_n1_mxnet_24.04 With olympic score:

[1.018748075108887, 1.0601459916687128]

Without olympic score:

[1.0262860923600325, 1.0699036853548622]

pgmpablo157321 avatar Aug 14 '24 16:08 pgmpablo157321