Taebum Kim

Results 67 comments of Taebum Kim

I am also curious about this issue

Also encountered the same problem with non-MoE model. I tried to run training job of Llama 13B model on two DGX A100 nodes, but the time breakdown shows: ``` forward-backward...

My issue has been resolved by passing `--device=/dev/infiniband` in docker run argument.

Sorry for late reply :( If you want to use second cuda cards, please check that ``` $ nvidia-smi ``` shows 2 cards well. If you want to use forced...

yes you must configure some command line argument you can refer `train.sh`

nginx conf 파일 저게 전부인가요?? 현재 서버에 setting 되어 있는 `django/setting.py`랑 `nginx.conf` 깃에 어느 브랜치에 해당하는지 링크 주시겠어요??

앗 넵넵 일단 그렇게 해결하면 되는 것 같습니다!

@Yoo-Youngjae 추가하였습니다 Azure 계정에서 디렉터리 전환 => phyaktaebum~~~으로 시작하는 directory로 전환하시고 VM 설정하실 때 resource group을 team9으로 설정하시면 됩니다

@ttoru96 team5 resource group으로 사용하시면 됩니다

@digdhg Team 21 group 추가해드렸습니다