What are the properties of the GPU used in the Running examples?
If i'm not mistaken this was done on GV100 with CUDA 9.0. Here is the output for GV100 and CUDA 12.0 for current code revision, i.e. single GPU:
# examples/amgx_capi -m ../examples/matrix.mtx -c ../src/configs/FGMRES_AGGREGATION.json
AMGX version 2.4.0
Built on Apr 11 2023, 20:30:29
Compiled with CUDA Runtime 12.0, using CUDA driver 12.0
Warning: No mode specified, using dDDI by default.
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
AMG Grid:
Number of Levels: 1
LVL ROWS NNZ PARTS SPRSTY Mem (GB)
----------------------------------------------------------------------
0(D) 12 61 1 0.424 8.75e-07
----------------------------------------------------------------------
Grid Complexity: 1
Operator Complexity: 1
Total Memory Usage: 8.75443e-07 GB
----------------------------------------------------------------------
iter Mem Usage (GB) residual rate
----------------------------------------------------------------------
Ini 1.74603 3.464102e+00
0 1.74603 9.112230e-15 0.0000
----------------------------------------------------------------------
Total Iterations: 1
Avg Convergence Rate: 0.0000
Final Residual: 9.112230e-15
Total Reduction in Residual: 2.630474e-15
Maximum Memory Usage: 1.746 GB
----------------------------------------------------------------------
Total Time: 0.00117027
setup: 0.000565248 s
solve: 0.000605024 s
solve(per iteration): 0.000605024 s
or MPI example:
# mpirun -n 2 examples/amgx_mpi_capi -m ../examples/matrix.mtx -c ../src/configs/FGMRES_AGGREGATION.json
Authorization required, but no authorization protocol specified
Process 1 selecting device 0
Process 0 selecting device 0
AMGX version 2.4.0
Built on Apr 11 2023, 20:30:29
Compiled with CUDA Runtime 12.0, using CUDA driver 12.0
Warning: No mode specified, using dDDI by default.
Warning: No mode specified, using dDDI by default.
Cannot read file as JSON object, trying as AMGX config
Converting config string to current config version
Parsing configuration string: exception_handling=1 ;
Using Normal MPI (Hostbuffer) communicator...
Reading matrix dimensions in file: ../examples/matrix.mtx
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
Using Normal MPI (Hostbuffer) communicator...
Using Normal MPI (Hostbuffer) communicator...
Using Normal MPI (Hostbuffer) communicator...
AMG Grid:
Number of Levels: 1
LVL ROWS NNZ PARTS SPRSTY Mem (GB)
----------------------------------------------------------------------
0(D) 12 61 2 0.424 1.1e-06
----------------------------------------------------------------------
Grid Complexity: 1
Operator Complexity: 1
Total Memory Usage: 1.09896e-06 GB
----------------------------------------------------------------------
iter Mem Usage (GB) residual rate
----------------------------------------------------------------------
Ini 2.60217 3.464102e+00
0 2.60217 3.166381e+00 0.9141
1 2.6022 3.046277e+00 0.9621
2 2.6022 2.804132e+00 0.9205
3 2.6022 2.596292e+00 0.9259
4 2.6022 2.593806e+00 0.9990
5 2.6022 3.124839e-01 0.1205
6 2.6022 5.373423e-02 0.1720
7 2.6022 9.795357e-04 0.0182
8 2.6022 4.081205e-13 0.0000
----------------------------------------------------------------------
Total Iterations: 9
Avg Convergence Rate: 0.0366
Final Residual: 4.081205e-13
Total Reduction in Residual: 1.178142e-13
Maximum Memory Usage: 2.602 GB
----------------------------------------------------------------------
Total Time: 0.0502149
setup: 0.00784179 s
solve: 0.0423731 s
solve(per iteration): 0.00470812 s
If i'm not mistaken this was done on GV100 with CUDA 9.0. Here is the output for GV100 and CUDA 12.0 for current code revision, i.e. single GPU:
# examples/amgx_capi -m ../examples/matrix.mtx -c ../src/configs/FGMRES_AGGREGATION.json AMGX version 2.4.0 Built on Apr 11 2023, 20:30:29 Compiled with CUDA Runtime 12.0, using CUDA driver 12.0 Warning: No mode specified, using dDDI by default. Reading data... RHS vector was not found. Using RHS b=[1,…,1]^T Solution vector was not found. Setting initial solution to x=[0,…,0]^T Finished reading AMG Grid: Number of Levels: 1 LVL ROWS NNZ PARTS SPRSTY Mem (GB) ---------------------------------------------------------------------- 0(D) 12 61 1 0.424 8.75e-07 ---------------------------------------------------------------------- Grid Complexity: 1 Operator Complexity: 1 Total Memory Usage: 8.75443e-07 GB ---------------------------------------------------------------------- iter Mem Usage (GB) residual rate ---------------------------------------------------------------------- Ini 1.74603 3.464102e+00 0 1.74603 9.112230e-15 0.0000 ---------------------------------------------------------------------- Total Iterations: 1 Avg Convergence Rate: 0.0000 Final Residual: 9.112230e-15 Total Reduction in Residual: 2.630474e-15 Maximum Memory Usage: 1.746 GB ---------------------------------------------------------------------- Total Time: 0.00117027 setup: 0.000565248 s solve: 0.000605024 s solve(per iteration): 0.000605024 sor MPI example:
# mpirun -n 2 examples/amgx_mpi_capi -m ../examples/matrix.mtx -c ../src/configs/FGMRES_AGGREGATION.json Authorization required, but no authorization protocol specified Process 1 selecting device 0 Process 0 selecting device 0 AMGX version 2.4.0 Built on Apr 11 2023, 20:30:29 Compiled with CUDA Runtime 12.0, using CUDA driver 12.0 Warning: No mode specified, using dDDI by default. Warning: No mode specified, using dDDI by default. Cannot read file as JSON object, trying as AMGX config Converting config string to current config version Parsing configuration string: exception_handling=1 ; Using Normal MPI (Hostbuffer) communicator... Reading matrix dimensions in file: ../examples/matrix.mtx Reading data... RHS vector was not found. Using RHS b=[1,…,1]^T Solution vector was not found. Setting initial solution to x=[0,…,0]^T Finished reading Using Normal MPI (Hostbuffer) communicator... Using Normal MPI (Hostbuffer) communicator... Using Normal MPI (Hostbuffer) communicator... AMG Grid: Number of Levels: 1 LVL ROWS NNZ PARTS SPRSTY Mem (GB) ---------------------------------------------------------------------- 0(D) 12 61 2 0.424 1.1e-06 ---------------------------------------------------------------------- Grid Complexity: 1 Operator Complexity: 1 Total Memory Usage: 1.09896e-06 GB ---------------------------------------------------------------------- iter Mem Usage (GB) residual rate ---------------------------------------------------------------------- Ini 2.60217 3.464102e+00 0 2.60217 3.166381e+00 0.9141 1 2.6022 3.046277e+00 0.9621 2 2.6022 2.804132e+00 0.9205 3 2.6022 2.596292e+00 0.9259 4 2.6022 2.593806e+00 0.9990 5 2.6022 3.124839e-01 0.1205 6 2.6022 5.373423e-02 0.1720 7 2.6022 9.795357e-04 0.0182 8 2.6022 4.081205e-13 0.0000 ---------------------------------------------------------------------- Total Iterations: 9 Avg Convergence Rate: 0.0366 Final Residual: 4.081205e-13 Total Reduction in Residual: 1.178142e-13 Maximum Memory Usage: 2.602 GB ---------------------------------------------------------------------- Total Time: 0.0502149 setup: 0.00784179 s solve: 0.0423731 s solve(per iteration): 0.00470812 s
Thanks a lot for your reply! I'm new for this. Could you please help me with some questions: 1) Where did you run this test, colab or some cluster? I run on my lab's cluster using a V100, and the solve (per iteration) is x10 of your result. Is it possible to run AmgX on Colab? 2) I need this to solve Ax=b problem. PETSc vs AmgX, which one is better? Thanks!
Sample matrix is more for a sanity check - 12 rows is not enough for stable performance comparison. You can run, for example, generated poisson example:
# mpirun -n 1 examples/amgx_mpi_poisson7 -mode dDDI -p 300 300 300 1 1 1 -c ../src/configs/FGMRES_AGGREGATION.json
...
solve(per iteration): 0.526054 s
Regarding PETSc, you can integrate AMGX there (https://github.com/barbagroup/AmgXWrapper) and compare build-in solvers with AMGX using same interface :)