MFC CPU and GPU producing different results in some cases

CPU and GPU builds are sometimes producing meaningfully different results. This has so far been observed for

Immersed boundary method, Karman vortex street (via @anshgupta1234)
Cases with body forces (via @wilfonba), including
- Falling droplet
- Rayleigh--Taylor instability

Of these three cases, two involve physical instability, so of the three, it seems that the falling droplet cases are the easiest to analyze.

Some previous results noted that -O2 (instead of -O3) builds on GPU matched CPU results, but this does not seem to be universally true, as noted for the IBM case via @anshgupta1234.

It is not clear to me where the difference arises, though the reconstruction procedure and Riemann solve are the first most obvious places to look.

Sep 08 '23 02:09 sbryngelson

@wilfonba, is there a example case that can be used to test this on your fork ?

Sep 17 '23 22:09 anandrdbz

@anandrdbz I just made a push with new two example cases, 2D_rayleigh_taylor and 2D_rising_bubble

Sep 18 '23 00:09 wilfonba

@wilfonba, is there a example case that can be used to test this on your fork ?

FYI if you look at @wilfonba's most recent slides you'll see that GCC and NVHPC give different results even on CPU, and NVHPC gives different results for different optimization levels (none matching the GCC result).

Sep 18 '23 01:09 sbryngelson

@sbryngelson , but I presume this difference is only present in the body-force problems and not the regular cases like shock droplet or bubble screen ?

Sep 18 '23 01:09 anandrdbz

@anandrdbz Unsure at this point. The problems that made this issue apparent were body-force and immersed boundary problems, but I haven't looked into if other features have similar issues.

Sep 18 '23 01:09 wilfonba

@anandrdbz I just made a push with new two example cases, 2D_rayleigh_taylor and 2D_rising_bubble

Okay, I guess this is the STRevert branch right ? Also if other cases had this issue, most likely it would fail the test suite

Sep 18 '23 01:09 anandrdbz

Yeah, it's the STRevert branch. It might fail the test suite if other cases were a problem, but the test suite only runs 50 time-steps which may not be enough for differences that are significant enough to fail tests to happen. I'll look into running the test suite for more time-steps and see what I find.

Sep 18 '23 01:09 wilfonba

Okay sounds good, I'll take a look at the rising bubble problem tonight

Sep 18 '23 01:09 anandrdbz

Update: Different optimization levels give different results for GCC compilers on CPUs, NVHPC compilers on GPUs, and NVHPC compilers on CPUs. NVHPC compilers on GPUs with the same optimization level yield the same results when using different compiler versions on the same hardware and when using the same compilers on different hardware when using GPUs.

Dec 25 '23 17:12 wilfonba

The immersed boundary problem at least could potentially be related to the bug I observed with Cray compilers (round-off being significant)

Apr 05 '24 01:04 anandrdbz

Changing this from bug to "invalid". I think the real reason is just floating point problems. This happens across the board (different CPUs and compilers have this as well). The real issue is identifying how to avoid this when designing cases and writing code. Leaving this open for further discussion of the particular cases that 'trigger' this so we can find the root causes so we can avoid them later.

May 01 '24 16:05 sbryngelson

No activity on this. Closing.

Jul 03 '24 17:07 sbryngelson