AmgX not determinstic on large matrices even with determinism_flag=1
I have a large matrix (approximately 8 million rows) for which AmgX doesn't produce deterministic results. Running the exact same programme twice with the exact same input will produce slight differences in the solution vector. These differences are on the order of 1e-14. Nevertheless, with determinism_flag=1 the solution should be entirely deterministic and give exactly the same results right? So far I've only done detailed testing on a single GPU, but some other tests I did suggested this was also occurring for multi-GPU runs too.
It looks like the non-determinism occurs when constructing the multi-grid hierarchy as when I print the grid stats I can see slight differences between the number of rows and non-zero entries in some of the grid levels between runs.
I'm happy to provide a minimum working example and the matrix I'm using if you want me to. Just let me know.
Hi @joconnor22 ,
Can you share the config you are using? Providing the matrix would help identifying the issue too.
Thanks!
Hi @marsaev, thanks for your reply.
The solver config I'm using is:
{
"config_version": 2,
"verbosity_level": 0,
"determinism_flag": 1,
"communicator": "MPI",
"solver": {
"solver": "GMRES",
"print_solve_stats": 1,
"obtain_timings": 0,
"monitor_residual": 1,
"convergence": "RELATIVE_INI_CORE",
"tolerance": 1e-12,
"max_iters": 100,
"preconditioner": {
"solver": "AMG",
"algorithm": "CLASSICAL",
"print_grid_stats": 1,
"cycle": "V",
"selector": "PMIS",
"interpolator": "D2",
"smoother": "BLOCK_JACOBI",
"coarse_solver": "DENSE_LU_SOLVER",
"dense_lu_num_rows": 4,
"presweeps": 2,
"postsweeps": 2,
"max_iters": 1
}
}
}
What's the best way to provide the matrix? I can only reproduce this problem on large matrices (e.g. 8 million rows) so even after compression it's still too big to upload here.
Just to add some additional insight here. The determinism_flag applies to aggregators and matrix coloring (so specific to the aggregation based AMG rather than classical). Still, I think we should be observing run to run reproducibility.
Am I correct in understanding that you observe run to run variability in the level structure of the preconditioner? I definitely would not expect this to happen so possibly a bug somewhere.
For interest did you try the branch v2.1.x? We are trialing a significant number of optimisations and fixes in this development branch.
Hi, yes with print_grid_stats=1 I get a slightly different output between runs (e.g. some levels have different numbers of rows/entries between runs). Actually, it was more like there were two/three possible scenarios and the output of each run would always be one of those two/three outcomes. This only seemed to occur when the matrix was relatively large (towards a million rows or so). For smaller matrices it seemed to always be reproducible between runs.
It's been a while now since I looked at this so I can't remember exactly but I'm pretty sure I did test the v2.1 branch as well.
@joconnor22 , i'm looking into the system you uploaded right now - will try it with dev branch, will let you know if it reproduces
Great, thanks.
@joconnor22 i found reason for non-determinism for your case - ordering of atomics for a large number of hits to same memory address and floating addition affected weights for classical selector. I will submit deterministic version soon.
OK great, thanks a lot for taking the time to look at it!
Tracking internally: AMGX-45