CPU and GPU producing different results in some cases
CPU and GPU builds are sometimes producing meaningfully different results. This has so far been observed for
- Immersed boundary method, Karman vortex street (via @anshgupta1234)
- Cases with body forces (via @wilfonba), including
- Falling droplet
- Rayleigh--Taylor instability
Of these three cases, two involve physical instability, so of the three, it seems that the falling droplet cases are the easiest to analyze.
Some previous results noted that -O2 (instead of -O3) builds on GPU matched CPU results, but this does not seem to be universally true, as noted for the IBM case via @anshgupta1234.
It is not clear to me where the difference arises, though the reconstruction procedure and Riemann solve are the first most obvious places to look.
@wilfonba, is there a example case that can be used to test this on your fork ?
@anandrdbz I just made a push with new two example cases, 2D_rayleigh_taylor and 2D_rising_bubble
@wilfonba, is there a example case that can be used to test this on your fork ?
FYI if you look at @wilfonba's most recent slides you'll see that GCC and NVHPC give different results even on CPU, and NVHPC gives different results for different optimization levels (none matching the GCC result).
@sbryngelson , but I presume this difference is only present in the body-force problems and not the regular cases like shock droplet or bubble screen ?
@anandrdbz Unsure at this point. The problems that made this issue apparent were body-force and immersed boundary problems, but I haven't looked into if other features have similar issues.
@anandrdbz I just made a push with new two example cases, 2D_rayleigh_taylor and 2D_rising_bubble
Okay, I guess this is the STRevert branch right ? Also if other cases had this issue, most likely it would fail the test suite
Yeah, it's the STRevert branch. It might fail the test suite if other cases were a problem, but the test suite only runs 50 time-steps which may not be enough for differences that are significant enough to fail tests to happen. I'll look into running the test suite for more time-steps and see what I find.
Okay sounds good, I'll take a look at the rising bubble problem tonight
Update: Different optimization levels give different results for GCC compilers on CPUs, NVHPC compilers on GPUs, and NVHPC compilers on CPUs. NVHPC compilers on GPUs with the same optimization level yield the same results when using different compiler versions on the same hardware and when using the same compilers on different hardware when using GPUs.
The immersed boundary problem at least could potentially be related to the bug I observed with Cray compilers (round-off being significant)
Changing this from bug to "invalid". I think the real reason is just floating point problems. This happens across the board (different CPUs and compilers have this as well). The real issue is identifying how to avoid this when designing cases and writing code. Leaving this open for further discussion of the particular cases that 'trigger' this so we can find the root causes so we can avoid them later.
No activity on this. Closing.