Heyang Qin

Results 28 comments of Heyang Qin

Hello @lucadiliello. Thank you for reporting this issue to us. Could you share a script or commandline for us to reproduce this issue?

I ran into the same issue. After a whole day of trial and error, I finally solved it by disabling IPv6 as suggested here: https://stackoverflow.com/questions/57992691/pip-hangs-on-starting-new-https-connection. However, I have no idea...

One of our recent fixes https://github.com/microsoft/DeepSpeed/pull/3819 should have fixed this issue. It is not included in the pypi release yet so you need to install deepspeed from source to apply...

Hello @Bill-Orz. We have fixed the hanging issue in https://github.com/microsoft/DeepSpeedExamples/pull/636. Please update to the latest DeepSpeedExample.

Hello @liuaiting. Thank you for reporting this issue to us. One of our recent fixes https://github.com/microsoft/DeepSpeed/pull/3462 may have already fixed this error. Could you update your deepspeed and give it...

Hello @sindhuvahinis @lanking520, thank you for reporting this! With the merge of https://github.com/microsoft/DeepSpeed/pull/2725, the major part of this issue should have been resolved. I tested the models you listed with...

> @HeyangQin we did some tests on 2725 as well and still observing the major issues with INT8. Will share more details and setup @lanking520 Thank you for the update!...

Hi @lanking520 @sindhuvahinis, Thank you for the information. Previously I only tested checkpoint loading with int8. Now when I test checkpoint saving with int8, I see the same error as...

@tjruwase I reworked the previous PR. This PR would check GPU count against world size for all dist tests so it avoids issues like https://github.com/microsoft/DeepSpeed/issues/2733 and https://github.com/microsoft/DeepSpeed/issues/2482 for all the...