DeepSpeed
DeepSpeed copied to clipboard
Assert mp_size is factor of model dimensions
The number of GPUs or mp_size needs to be a factor of a model's hidden dimension, embedded dimension, number of attention heads, etc. Otherwise we encounter various tensor size errors as described in https://github.com/microsoft/DeepSpeed/issues/2793