efsotr

Results 13 comments of efsotr

Setting overlap_comm to False can avoid this problem.

@darcula1993 I am curious about what your deepspeed configuration are like.

> > Setting overlap_comm to False can avoid this problem. > > this solved the issue for me -- but what can we do to use overlap_comm? there is a...

3. the total number of tokens in fine-tuning stage

@tjruwase https://github.com/microsoft/DeepSpeed/blob/9b7fc5452471392b0f58844219fcfdd14a9cdc77/deepspeed/runtime/zero/stage_1_and_2.py#L1054C2-L1146C1 If reduce_scatter is True, program will enter in allreduce_no_retain (L1141) (default value of use_multi_rank_bucket_allreduce is False)

allreduce_no_retain (L1549) will call allreduce_and_copy with rank being not None allreduce_and_copy (L1526) will call allreduce_bucket with rank being not None allreduce_bucket (L1488) will call dist.reduce when rank is not None...

@tjruwase use_multi_rank_bucket_allreduce=True: will call allreduce_and_scatter allreduce_and_scatter (L1016) will call allreduce_and_copy_with_multiple_ranks allreduce_and_copy_with_multiple_ranks (L1004) will call allreduce_bucket with default value of rank allreduce_bucket (L1488) will call dist.all_reduce when rank is None

I find a solution like following that first, downgrading to version deepspeed==0.14.2 second, export CPATH=$CPATH:[Your Path]/anaconda3/envs/LLM/lib/python3.11/site-packages/nvidia/cuda_runtime/include:[Your Path]/anaconda3/envs/LLM/targets/x86_64-linux/include It seems that the building program of DeepSpeedCPUAdam can't properly find the path...