AMGX icon indicating copy to clipboard operation
AMGX copied to clipboard

Problem with the resources create and solver create time

Open kvmkrao opened this issue 4 years ago • 5 comments

Hi, I am interested in the amgx_mpi_possion5pt tutorial. I compared the solve time and total time with that with a solver in PETSc.

Here is the output from the amgx solver:

mpirun -np 1 ./amgx_mpi_possion5pt.exe -mode dDDI -p 5000 5000 -c PBICGSTAB.json Invalid MIT-MAGIC-COOKIE-1 keyProcess 0 selecting device 0 AMGX version 2.1.0.131-opensource Built on Mar 5 2021, 16:22:34 Compiled with CUDA Runtime 10.1, using CUDA driver 11.2 Cannot read file as JSON object, trying as AMGX config Converting config string to current config version Parsing configuration string: exception_handling=1 ; Using Normal MPI (Hostbuffer) communicator... iter Mem Usage (GB) residual rate -------------------------------------------------------------- Ini 13.6803 5.000000e+03 0 13.6803 1.863238e+02 0.0373 6 13.6920 5.114025e-07 0.0404 -------------------------------------------------------------- Total Iterations: 7 Avg Convergence Rate: 0.0374 Final Residual: 5.114025e-07 Total Reduction in Residual: 1.022805e-10 Maximum Memory Usage: 13.692 GB -------------------------------------------------------------- Total Time: 1.74159 setup: 1.28347 s solve: 0.458115 s solve(per iteration): 0.0654451 s resources create 373.340718 create matrix 0.000109 create vectors 0.000013 solver create 104.717192 solver setup 3.145772 solver time 0.458062 solver status 0.000003 Total time 481.661869

Here is the output from the hypre solver in the Petsc with 32 processing elements nx 5000 ny 5000 setup time 0.066932 fill matrix: time 0.238816 set up vectors:time 0.009836 setup ksp 0.02139 Solve 4.44923 Total 4.78621

The total time to solve the linear system with Amgx solver is very high compared to that with PETSc. The linear solve time of Amgx solver is less compared to PETSc solver. In Amgx, why the resources creation and solver creation is outrageously high compared to linear solver setup or solve time. How can I fix it.

Thank you.

kvmkrao avatar Mar 23 '21 23:03 kvmkrao

Hey @kvmkrao ,

Are those seconds? 373 and 104 seconds are incredibly high for any possible system. Just out of interest - is primary device context initialized prior to AMGX calls? Can you try do that? (something like cudaSetDevice(your_device_id); cudaFree(0);) Can you also try modifying example to run same code twice again (in the same process) to see if it's being amortized in any way?

Thanks,

marsaev avatar Mar 24 '21 13:03 marsaev

ps. I see you are using CUDA 10.1, i would always suggest using latest runtime possible, but that shouldn't be a reason behind what you see.

marsaev avatar Mar 24 '21 13:03 marsaev

Hi @marsaev Yes, they are in seconds.

I added cudaSetDevice and cudaFree calls to the poisson5pt.c The modified example is executed on the GeForce RTX 3090 device. Here is the output of the modified example solve(per iteration): 0.0658232 s resources create 372.362349 create matrix 0.000104 create vectors 0.000016 solver create 104.612388 solver setup 3.368089 solver time 0.450474 solver status 0.000003 Total time 480.793423

The resources creation and solver creating steps are consuming lot is time. I think, these calls are not improving the time spent in solver creation and resources creation.

As per your suggestion, I run the same code twice again (in the same process) to check the consistency in timings. Convergence and timing information from run1: -------------------------------------------------------------- Total Iterations: 7 Avg Convergence Rate: 0.0374 Final Residual: 5.114025e-07 Total Reduction in Residual: 1.022805e-10 Maximum Memory Usage: 13.639 GB -------------------------------------------------------------- solve(per iteration): 0.0703025 s resources create 372.686374 create matrix 0.000109 create vectors 0.000015 solver create 104.449227 solver setup 3.208526 solver time 0.492066 solver status 0.000003 Total time 480.836320

Convergence and timing information from run2: Invalid MIT-MAGIC-COOKIE-1 keyProcess 0 selecting device 0 AMGX version 2.1.0.131-opensource Built on Mar 5 2021, 16:22:34 Compiled with CUDA Runtime 10.1, using CUDA driver 11.2 Cannot read file as JSON object, trying as AMGX config Converting config string to current config version Parsing configuration string: exception_handling=1 ; Using Normal MPI (Hostbuffer) communicator... -------------------------------------------------------------- Total Iterations: 7 Avg Convergence Rate: 0.0374 Final Residual: 5.114025e-07 Total Reduction in Residual: 1.022805e-10 Maximum Memory Usage: 13.528 GB -------------------------------------------------------------- solve(per iteration): 0.0621781 s resources create 372.325135 create matrix 0.000109 create vectors 0.000013 solver create 104.636123 solver setup 3.179911 solver time 0.435191 solver status 0.000003 Total time 480.576485

The total time of run1 and run2 is approximately the same. I used gcc8.4.0 and openmpi-4.1.0 to compile the example code.

Thank you.

kvmkrao avatar Mar 24 '21 18:03 kvmkrao

Hey, @kvmkrao , did you resolve slow AMGX API sources? If not - can you run your example with profiler and see where time is spent?

marsaev avatar Apr 07 '21 18:04 marsaev

Hello Rao

Did you figure out why in your output you are getting message like ...

Invalid MIT-MAGIC-COOKIE-1 key Hope it is harmless

snsmssss avatar Aug 12 '22 05:08 snsmssss