Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

Double counts in parameter count

Open TevenLeScao opened this issue 4 years ago • 2 comments

Currently, parameter counts in utils.get_parameters_in_billions are inaccurate when PP > 1. Tied variables, in particular embedding layers, exist in several copies in the first and last PP stage, which causes double counts. For now the codebase only uses the count without embedding layers which is accurate, but it would be good for the count with embedding layers to also function, mostly for operation-counting.

For background see: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/40

TevenLeScao avatar Sep 15 '21 00:09 TevenLeScao

@thomasw21 also shared this: https://github.com/microsoft/DeepSpeed/blob/c7f3bc51c27884ad80dcafe4aa60f070c1dfa26e/deepspeed/runtime/pipe/engine.py#L117-L126 which seems to be related to this issue.

stas00 avatar Sep 20 '21 18:09 stas00

Ah yes, saw it on the other issue then forgot about it - I can take a look at the end of this week.

TevenLeScao avatar Sep 20 '21 22:09 TevenLeScao