AMGX What are the properties of the GPU used in the Running examples?

Apr 11 '23 14:04 hanfluid

If i'm not mistaken this was done on GV100 with CUDA 9.0. Here is the output for GV100 and CUDA 12.0 for current code revision, i.e. single GPU:

# examples/amgx_capi -m ../examples/matrix.mtx -c ../src/configs/FGMRES_AGGREGATION.json
AMGX version 2.4.0
Built on Apr 11 2023, 20:30:29
Compiled with CUDA Runtime 12.0, using CUDA driver 12.0
Warning: No mode specified, using dDDI by default.
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
AMG Grid:
         Number of Levels: 1
            LVL         ROWS               NNZ  PARTS    SPRSTY       Mem (GB)
        ----------------------------------------------------------------------
           0(D)           12                61      1     0.424       8.75e-07
         ----------------------------------------------------------------------
         Grid Complexity: 1
         Operator Complexity: 1
         Total Memory Usage: 8.75443e-07 GB
         ----------------------------------------------------------------------
           iter      Mem Usage (GB)       residual           rate
         ----------------------------------------------------------------------
            Ini             1.74603   3.464102e+00
              0             1.74603   9.112230e-15         0.0000
         ----------------------------------------------------------------------
         Total Iterations: 1
         Avg Convergence Rate: 		         0.0000
         Final Residual: 		   9.112230e-15
         Total Reduction in Residual: 	   2.630474e-15
         Maximum Memory Usage: 		          1.746 GB
         ----------------------------------------------------------------------
Total Time: 0.00117027
    setup: 0.000565248 s
    solve: 0.000605024 s
    solve(per iteration): 0.000605024 s

or MPI example:

# mpirun -n 2 examples/amgx_mpi_capi -m ../examples/matrix.mtx -c ../src/configs/FGMRES_AGGREGATION.json
Authorization required, but no authorization protocol specified
Process 1 selecting device 0
Process 0 selecting device 0
AMGX version 2.4.0
Built on Apr 11 2023, 20:30:29
Compiled with CUDA Runtime 12.0, using CUDA driver 12.0
Warning: No mode specified, using dDDI by default.
Warning: No mode specified, using dDDI by default.
Cannot read file as JSON object, trying as AMGX config
Converting config string to current config version
Parsing configuration string: exception_handling=1 ; 
Using Normal MPI (Hostbuffer) communicator...
Reading matrix dimensions in file: ../examples/matrix.mtx
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
Using Normal MPI (Hostbuffer) communicator...
Using Normal MPI (Hostbuffer) communicator...
Using Normal MPI (Hostbuffer) communicator...
AMG Grid:
         Number of Levels: 1
            LVL         ROWS               NNZ  PARTS    SPRSTY       Mem (GB)
        ----------------------------------------------------------------------
           0(D)           12                61      2     0.424        1.1e-06
         ----------------------------------------------------------------------
         Grid Complexity: 1
         Operator Complexity: 1
         Total Memory Usage: 1.09896e-06 GB
         ----------------------------------------------------------------------
           iter      Mem Usage (GB)       residual           rate
         ----------------------------------------------------------------------
            Ini             2.60217   3.464102e+00
              0             2.60217   3.166381e+00         0.9141
              1              2.6022   3.046277e+00         0.9621
              2              2.6022   2.804132e+00         0.9205
              3              2.6022   2.596292e+00         0.9259
              4              2.6022   2.593806e+00         0.9990
              5              2.6022   3.124839e-01         0.1205
              6              2.6022   5.373423e-02         0.1720
              7              2.6022   9.795357e-04         0.0182
              8              2.6022   4.081205e-13         0.0000
         ----------------------------------------------------------------------
         Total Iterations: 9
         Avg Convergence Rate: 		         0.0366
         Final Residual: 		   4.081205e-13
         Total Reduction in Residual: 	   1.178142e-13
         Maximum Memory Usage: 		          2.602 GB
         ----------------------------------------------------------------------
Total Time: 0.0502149
    setup: 0.00784179 s
    solve: 0.0423731 s
    solve(per iteration): 0.00470812 s

Apr 11 '23 21:04 marsaev

If i'm not mistaken this was done on GV100 with CUDA 9.0. Here is the output for GV100 and CUDA 12.0 for current code revision, i.e. single GPU:

# examples/amgx_capi -m ../examples/matrix.mtx -c ../src/configs/FGMRES_AGGREGATION.json
AMGX version 2.4.0
Built on Apr 11 2023, 20:30:29
Compiled with CUDA Runtime 12.0, using CUDA driver 12.0
Warning: No mode specified, using dDDI by default.
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
AMG Grid:
         Number of Levels: 1
            LVL         ROWS               NNZ  PARTS    SPRSTY       Mem (GB)
        ----------------------------------------------------------------------
           0(D)           12                61      1     0.424       8.75e-07
         ----------------------------------------------------------------------
         Grid Complexity: 1
         Operator Complexity: 1
         Total Memory Usage: 8.75443e-07 GB
         ----------------------------------------------------------------------
           iter      Mem Usage (GB)       residual           rate
         ----------------------------------------------------------------------
            Ini             1.74603   3.464102e+00
              0             1.74603   9.112230e-15         0.0000
         ----------------------------------------------------------------------
         Total Iterations: 1
         Avg Convergence Rate: 		         0.0000
         Final Residual: 		   9.112230e-15
         Total Reduction in Residual: 	   2.630474e-15
         Maximum Memory Usage: 		          1.746 GB
         ----------------------------------------------------------------------
Total Time: 0.00117027
    setup: 0.000565248 s
    solve: 0.000605024 s
    solve(per iteration): 0.000605024 s

or MPI example:

# mpirun -n 2 examples/amgx_mpi_capi -m ../examples/matrix.mtx -c ../src/configs/FGMRES_AGGREGATION.json
Authorization required, but no authorization protocol specified
Process 1 selecting device 0
Process 0 selecting device 0
AMGX version 2.4.0
Built on Apr 11 2023, 20:30:29
Compiled with CUDA Runtime 12.0, using CUDA driver 12.0
Warning: No mode specified, using dDDI by default.
Warning: No mode specified, using dDDI by default.
Cannot read file as JSON object, trying as AMGX config
Converting config string to current config version
Parsing configuration string: exception_handling=1 ; 
Using Normal MPI (Hostbuffer) communicator...
Reading matrix dimensions in file: ../examples/matrix.mtx
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
Using Normal MPI (Hostbuffer) communicator...
Using Normal MPI (Hostbuffer) communicator...
Using Normal MPI (Hostbuffer) communicator...
AMG Grid:
         Number of Levels: 1
            LVL         ROWS               NNZ  PARTS    SPRSTY       Mem (GB)
        ----------------------------------------------------------------------
           0(D)           12                61      2     0.424        1.1e-06
         ----------------------------------------------------------------------
         Grid Complexity: 1
         Operator Complexity: 1
         Total Memory Usage: 1.09896e-06 GB
         ----------------------------------------------------------------------
           iter      Mem Usage (GB)       residual           rate
         ----------------------------------------------------------------------
            Ini             2.60217   3.464102e+00
              0             2.60217   3.166381e+00         0.9141
              1              2.6022   3.046277e+00         0.9621
              2              2.6022   2.804132e+00         0.9205
              3              2.6022   2.596292e+00         0.9259
              4              2.6022   2.593806e+00         0.9990
              5              2.6022   3.124839e-01         0.1205
              6              2.6022   5.373423e-02         0.1720
              7              2.6022   9.795357e-04         0.0182
              8              2.6022   4.081205e-13         0.0000
         ----------------------------------------------------------------------
         Total Iterations: 9
         Avg Convergence Rate: 		         0.0366
         Final Residual: 		   4.081205e-13
         Total Reduction in Residual: 	   1.178142e-13
         Maximum Memory Usage: 		          2.602 GB
         ----------------------------------------------------------------------
Total Time: 0.0502149
    setup: 0.00784179 s
    solve: 0.0423731 s
    solve(per iteration): 0.00470812 s

Thanks a lot for your reply! I'm new for this. Could you please help me with some questions: 1) Where did you run this test, colab or some cluster? I run on my lab's cluster using a V100, and the solve (per iteration) is x10 of your result. Is it possible to run AmgX on Colab? 2) I need this to solve Ax=b problem. PETSc vs AmgX, which one is better? Thanks!

Apr 11 '23 21:04 hanfluid

Sample matrix is more for a sanity check - 12 rows is not enough for stable performance comparison. You can run, for example, generated poisson example:

# mpirun -n 1 examples/amgx_mpi_poisson7 -mode dDDI -p 300 300 300 1 1 1 -c ../src/configs/FGMRES_AGGREGATION.json
...
solve(per iteration): 0.526054 s

Apr 11 '23 22:04 marsaev

Regarding PETSc, you can integrate AMGX there (https://github.com/barbagroup/AmgXWrapper) and compare build-in solvers with AMGX using same interface :)

Apr 27 '23 23:04 marsaev