[Other] Binding GPUs when using ARM Perf-Report

Open XL64 opened this issue 3 years ago • 1 comments

Hello,

I'm trying to run GEOSX within ARM Perf-Report (20.0) on pangea3. I'm able to launch it on one MPI using : perf-report mpirun -n 1 ./bin/geosx -i beamBending_benchmark.xml (I add to deactivate Caliper which conflict with perf-report). When I try to run on 6 MPI, all MPI processes are binded to GPU #0. I have a script that can set CUDA_VISIBLE_DEVICES according to the MPI_RANK but perf-report does not allow to use it.

Does anyone as any idea how I could achieve running a GEOSX process using 6 MPI and 6 GPUs ? Maybe I could change the code to use GPU #(rank modulo number of GPU per node) instead of first one ? Where should I change that in the code ?

EDIT:

I tried to add a setupCUDA() just after the setupMPI(). In that setupCUDA() I call cudaSetDevice(rank%cudaDeviceNumber). It appears that some allocation are still done on device 0, after the cudaSetDevice(). My guess is that these allocations are done on other threads. Any idea how I should do ?

Regards,

Sep 14 '22 21:09 XL64

More precisely does anyone nows where I could add the equivalent of cudaSetDevice()

Sep 23 '22 15:09 XL64