efsotr
efsotr
Setting overlap_comm to False can avoid this problem.
Setting overlap_comm to False can avoid this problem.
@darcula1993 I am curious about what your deepspeed configuration are like.
> > Setting overlap_comm to False can avoid this problem. > > this solved the issue for me -- but what can we do to use overlap_comm? there is a...
3. the total number of tokens in fine-tuning stage
@tjruwase https://github.com/microsoft/DeepSpeed/blob/9b7fc5452471392b0f58844219fcfdd14a9cdc77/deepspeed/runtime/zero/stage_1_and_2.py#L1054C2-L1146C1 If reduce_scatter is True, program will enter in allreduce_no_retain (L1141) (default value of use_multi_rank_bucket_allreduce is False)
allreduce_no_retain (L1549) will call allreduce_and_copy with rank being not None allreduce_and_copy (L1526) will call allreduce_bucket with rank being not None allreduce_bucket (L1488) will call dist.reduce when rank is not None...
@tjruwase use_multi_rank_bucket_allreduce=True: will call allreduce_and_scatter allreduce_and_scatter (L1016) will call allreduce_and_copy_with_multiple_ranks allreduce_and_copy_with_multiple_ranks (L1004) will call allreduce_bucket with default value of rank allreduce_bucket (L1488) will call dist.all_reduce when rank is None
I find a solution like following that first, downgrading to version deepspeed==0.14.2 second, export CPATH=$CPATH:[Your Path]/anaconda3/envs/LLM/lib/python3.11/site-packages/nvidia/cuda_runtime/include:[Your Path]/anaconda3/envs/LLM/targets/x86_64-linux/include It seems that the building program of DeepSpeedCPUAdam can't properly find the path...
meet the same problem `RuntimeError: operator torchvision::nms does not exist`