[Other] Binding GPUs when using ARM Perf-Report
Hello,
I'm trying to run GEOSX within ARM Perf-Report (20.0) on pangea3.
I'm able to launch it on one MPI using : perf-report mpirun -n 1 ./bin/geosx -i beamBending_benchmark.xml (I add to deactivate Caliper which conflict with perf-report).
When I try to run on 6 MPI, all MPI processes are binded to GPU #0. I have a script that can set CUDA_VISIBLE_DEVICES according to the MPI_RANK but perf-report does not allow to use it.
Does anyone as any idea how I could achieve running a GEOSX process using 6 MPI and 6 GPUs ? Maybe I could change the code to use GPU #(rank modulo number of GPU per node) instead of first one ? Where should I change that in the code ?
EDIT:
I tried to add a setupCUDA() just after the setupMPI(). In that setupCUDA() I call cudaSetDevice(rank%cudaDeviceNumber). It appears that some allocation are still done on device 0, after the cudaSetDevice(). My guess is that these allocations are done on other threads. Any idea how I should do ?
Regards,
XL
More precisely does anyone nows where I could add the equivalent of cudaSetDevice()