Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

Plot scaling laws of our baseline models

Open slippylolo opened this issue 4 years ago • 2 comments

For our three baselines on different datasets (OSCAR, C4, The Pile), we would like to plot scaling laws and retrieve their coefficients. Specifically, we are looking to reproduce Figure 1 of Scaling Laws for Neural Language Models.

The TensorBoard data for the baseline runs can be retrieved on the Big Science space on HuggingFace: it's the tr3 runs with tensorboard in their name. The naming scheme (tr3b, tr3c, etc.) is explained here. For C4, we have a XL, L, and M model (tr3, tr3c, tr3c) with short warm-up. For OSCAR and The Pile, we have an XL, L, M, and S model (tr3d, tr3g, tr3h, tr3i and tr3, tr3j, tr3k, tr3l). For OSCAR, we can should also add the 13B run to see if the fits hold (that's tr1-13B).

slippylolo avatar Oct 05 '21 07:10 slippylolo

so just to make sure-the loss is taken from the "lm-loss-validation/lm loss validation"? from the last step? or from the global minimum loss?

srulikbd avatar Oct 06 '21 20:10 srulikbd

I've temporarily assigned @slippylolo , feel free to re-assign.

thomasw21 avatar Oct 21 '21 23:10 thomasw21