Konnase Lee
Konnase Lee
add cudaDeviceReset() into p2pBandwidthLatencyTest to free gpu memory after test
I created an issue #284 about 4 months ago, and I suggested that we should replace tf.train.Supervisor with tf.train.MonitoredTrainingSession, as the later will restart session when facing OS Error(or communication...
链接google-hosts.sh到/usr/bin/google-hosts后,在终端输入google-hosts,提示: `google-hosts:未找到命令` 加上sudo也不行,切换成root用户也找不着。 列出/usr/bin下面的文件  google-hosts显示为红色,表示压缩文件吗? 还望指教!
Service label is `app: pytorch-operator`, while selector is `name: pytorch-operator`. Deployment spec label and selector are both `name: pytorch-operator`.  In such a case, both the service and deployment have...
1. add tcp store for rendezvous usage: ```c++ auto rank = getenv("RANK"); if (!rank) { rank = "0"; } auto world_size = getenv("WORLD_SIZE"); if (!world_size) { world_size = "1"; }...