Bill-Orz

Results 5 comments of Bill-Orz

I update to the latest version, but when i use zero stage 3, i'm hanging with the log "[2023-07-11 16:01:28,870] [INFO] [partition_parameters.py:326:__exit__] finished initializing model with 6.74B parameters". ![image](https://github.com/microsoft/DeepSpeed/assets/122064954/496da155-81cf-45a9-8c8c-caf992e34f98) ![image](https://github.com/microsoft/DeepSpeed/assets/122064954/a04abd2a-11c2-4ae0-9be8-3d73e0f7ee33)...

我跟原始deepspeed-chat代码对比了一下,看起来是一致的

谢谢,请问你这边step3是用了4个7B的llama来完成的吗?我这边80G A100,不开gradient checkpointing,一直会OOM