Feng Zou

Results 5 comments of Feng Zou

Could you check if there is shared memory folder: /dev/shm, on your nodes(291-294)? And what do you mean running multiple process on same node? like "mpirun -n 4 -ppn4 ..."?...

Could you also check if you can create and delete file in /dev/shm on these nodes?

Did you run the srun commands above on node 291-294? Or other nodes in cluster? Please check the /dev/shm folder on node 291-294. What is size of /dev/shm folder? We...

Please ensure you can login the nodes with password-less via ssh (ssh node119). this check is included in scripts/run_intelcaffe.sh. You can run your case with the script as: ./scripts/run_intelcaffe.sh --hostfile...

I believe if you remove "-f hostfile" in command, you should get 4 processes running on one node. You need to check if there is caffe process on each node.