Add comprehensive checks for problem cells
Description
comp_debug enables comprehensive error checking. At each Runge-Kutta sub-step, all conservative variables are checked for NaNs. The volume fractions are checked to ensure they are in the range [0, 1]. Negative densities are also checked for. If any of these checks find problems, the file comp_debug.txt will be written to the case directory with information about what problems were found and the simulation state will be saved for visualization.
Type of change
- [x] New feature (non-breaking change which adds functionality)
Scope
- [x] This PR comprises a set of related changes with a common goal
How Has This Been Tested?
I tested this by inserting problem cells after a certain number of time steps and seeing if the problems were identified and a save file dumped. I then ensured that post_process found this additional dump file and puts it in the silo database. I performed these tests on both CPUs and MI250x GPUs.
This seems obviously useful and helpful, the concern is with costs. What does the change in cost look like for a typical 3D problem on (a) 1 CPU and (b) 1 GPU?
Do you perform any allreduce calls? Like for getting maxs/mins over the entire domain? If so, that would be prohibitively expensive for a large simulation.
Codecov Report
Attention: Patch coverage is 1.38889% with 71 lines in your changes missing coverage. Please review.
Project coverage is 54.10%. Comparing base (
75f5e3b) to head (7c48428). Report is 90 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #631 +/- ##
==========================================
- Coverage 54.38% 54.10% -0.28%
==========================================
Files 61 61
Lines 13751 13821 +70
Branches 1720 1731 +11
==========================================
Hits 7478 7478
- Misses 5817 5887 +70
Partials 456 456
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I moved the comprehensive debug routine to m_sim_helpers but as a result I had to move the runtime info file subroutines to m_sim_helpers as well because m_sim_helpers now needs s_write_data to perform the data dump of the problematic state.
Useful feature but needs more attention when time is available.