bettyballin
Results
2
comments of
bettyballin
Hi, I have the same issue in deepspeed version 0.8.0. I'm calling my python script with `NCCL_DEBUG=INFO NCCL_BLOCKING_WAIT=1 deepspeed --num_gpus=4 --master_addr="myIP" --master_port=1234 --hostfile=job/hostfile myPythonScript.py`. I'm using the huggingface Trainer implementation...
same here, getting: subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'Release', '--', '-j288']' returned non-zero exit status 2.