Bill-Orz
Bill-Orz
facing the same issue
I update to the latest version, but when i use zero stage 3, i'm hanging with the log "[2023-07-11 16:01:28,870] [INFO] [partition_parameters.py:326:__exit__] finished initializing model with 6.74B parameters".  ...
我跟原始deepspeed-chat代码对比了一下,看起来是一致的
谢谢,请问你这边step3是用了4个7B的llama来完成的吗?我这边80G A100,不开gradient checkpointing,一直会OOM