Carl Ponder
Carl Ponder
Note also that these labels Private Caches Shared Caches Persistent Caches are placed above the chart rather than below the X-axis, unlike the posted example. I tried viewing my output...
Here's an example of a parameterized invocation line mpirun -n $((NODES*4)) -bind-to socket --cpu-list 0,6,12,18 --report-bindings which, of course, doesn't work correctly: [dgx2-03:44870] MCW rank 0 is not bound (or...
I can't tell if that should solve the problem or not, OpenMPI 4.0.2 fails-out with the error below. The error doesn't happen if I omit the "--rank-by socket" flag. The...
In this case here with one node mpirun -n 4 --rank-by node --report-bindings ./MGTranspose 2048 2048 100 it's still splitting across sockets, here's the mapping for rank 3, for example:...
If it were up to me, I ought to be able to specify --npersocket 4, and then the first 4 procs would go to socket 0 and then spill over...
I'm still seeing the problem, here's 2 procs on 1 node: + mpirun -n 2 --map-by socket:span --report-bindings ./MGTranspose 256 32768 100 [dgx2-03:126956] MCW rank 0 bound to socket 0[core...
I believe with MVAPICH2 that either of these forms would solve the problem mpirun_rsh -np $((PROCS*NODES)) MV2_CPU_BINDING_POLICY=bunch ... mpirun_rsh -np $((4*NODES)) MV2_CPU_MAPPING=0,1,2,3 ... The second form uses a fixed number...
What are you closing here? There are 2 issues now, (1) my wanting a way to run one-socket-per-node without using a hostfile, and (2) the error-message that Alex is reporting.
Using an existing hostfile means that I can't vary the number of nodes, right? Or, at the very least, I can't port my run-script to another cluster without having to...
Can you give me a more specific command I can try running?