aditya-sanas

Results 3 issues of aditya-sanas

**Describe the bug** I am getting NCCL timeout issue while training the model. The code usually runs for 40k epochs and then fails with the below error: ``` [rank2]:[E513 13:25:57.714781669...

bug
ASR

### Bug description I am getting NCCL timeout issue while training the model. The code usually runs for 40k epochs and then fails with the below error: ``` [rank2]:[E513 13:25:57.714781669...

bug
help wanted
distributed
repro needed
ver: 2.4.x

When I run any gpu process inside my docker container, I see that GPU is getting utilised but the pids are not visible in the output of nvidia-smi **Steps to...