Hiki
Hiki
@JThh Hello, I got the same error when I try to train 34b model in 3 nodes (8 * 40G, 500G Main Memory). I have seen the cpu memory usage...
Plus I set the tp = 8 in each node. So, I guess it maybe init the model 8 time.
@flybird11111 Hi, I didn't find the enable_gradient_accumulation and no_sync() in HybridParallelPlugin [https://github.com/hpcaitech/ColossalAI/blob/main/colossalai/booster/plugin/hybrid_parallel_plugin.py](url). So I wonder how to add gradient accumulation in HybridParallelPlugin following [https://colossalai.org/docs/features/gradient_accumulation_with_booster](url). Can you provide more details?