Double counts in parameter count
Currently, parameter counts in utils.get_parameters_in_billions are inaccurate when PP > 1. Tied variables, in particular embedding layers, exist in several copies in the first and last PP stage, which causes double counts. For now the codebase only uses the count without embedding layers which is accurate, but it would be good for the count with embedding layers to also function, mostly for operation-counting.
For background see: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/40
@thomasw21 also shared this: https://github.com/microsoft/DeepSpeed/blob/c7f3bc51c27884ad80dcafe4aa60f070c1dfa26e/deepspeed/runtime/pipe/engine.py#L117-L126 which seems to be related to this issue.
Ah yes, saw it on the other issue then forgot about it - I can take a look at the end of this week.