DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Fix memory alignment bug in Stage 3 that gets triggered when number o…

Open samyam opened this issue 4 years ago • 2 comments

…f params are not multiple of the world size

samyam avatar Mar 08 '21 23:03 samyam

It appears this fixed the previous issue in one spot. However, now I am seeing the same error in a new assert:

https://github.com/microsoft/DeepSpeed/blob/da71a8975d7387c903c32abd4ec0ff6f174980e0/deepspeed/runtime/zero/stage3.py#L2251-L2253

image

jeffra avatar Mar 09 '21 17:03 jeffra

Can one of the admins verify this patch?

rocm-mici avatar Jun 09 '22 20:06 rocm-mici

Pretty sure this is not needed anymore, the code around this spot has changed significantly since then. @tjruwase do you know more here?

jeffra avatar Mar 24 '23 03:03 jeffra

@jeffra, yes okay to close.

tjruwase avatar Mar 24 '23 12:03 tjruwase