sleef Certain tests fail on gcc 12.0.0 pre-release

Fedora Linux just introduced a pre-release build of gcc 12.0.0 to the development version of Fedora (Rawhide). (Fedora 36 will be released with GCC 12 in mid-2022.) This has resulted in a few new test failures in the sleef package on several architectures.

The release notes for GCC 12 are here. It’s not obvious whether these new failures are a Sleef bug or a GCC bug. Disabling LTO makes no difference.

armv7hl

OK (no failures)

i686

OK (no failures)

x86_64

The following tests FAILED:
	 28 - iutpurecfma_scalar (Failed)
	 29 - iutypurecfma_scalar (Failed)

aarch64

The following tests FAILED:
	 14 - iutpurec_scalar (Failed)
	 15 - iutypurec_scalar (Failed)
	 17 - iutpurecfma_scalar (Failed)
	 18 - iutypurecfma_scalar (Failed)

ppc64le

The following tests FAILED:
	  8 - iutpurec_scalar (Failed)
	  9 - iutypurec_scalar (Failed)
	 11 - iutpurecfma_scalar (Failed)
	 12 - iutypurecfma_scalar (Failed)

s390x

The following tests FAILED:
	  3 - iutyzvector2 (Failed)
	  6 - iutyzvector2nofma (Failed)
	  8 - iutpurec_scalar (Failed)
	  9 - iutypurec_scalar (Failed)
	 11 - iutpurecfma_scalar (Failed)
	 12 - iutypurecfma_scalar (Failed)

Jan 14 '22 20:01 musicinmybrain

Updated with ppc64le results.

Jan 15 '22 18:01 musicinmybrain

I re-tested this with Sleef 3.5.1 and GCC 14.0.1. I see some regressions in Sleef 3.6 (bug report to follow), and I wanted to make sure I still had a correct baseline to compare against. Note that I am not building the DFT library due to https://github.com/shibatch/sleef/issues/214, and I am not building the quad-precision library since it is still marked experimental in 3.5.1. (If possible, I plan to enable it for 3.6.)

Fedora no longer supports armv7hl, and I’ve elected to drop i686 support from the sleef package under https://fedoraproject.org/wiki/Changes/EncourageI686LeafRemoval, so this is just tested on the four 64-bit primary architectures: x86_64, aarch64, ppc64le, and s390x. I am still building with LTO; disabling LTO didn’t help when I made the original report, so I didn’t try it again.

All of the test failures on each architecture are still exactly the same as in the original report.

I don’t really know how to debug these, but please let me know if there’s anything I can do to help.

Feb 15 '24 15:02 musicinmybrain

Again thanks for the detailed report! AFAICT your baseline sounds sensible, these bugs have not been fixed yet unfortunately. I'm going to look into these failures right now.

I don’t really know how to debug these, but please let me know if there’s anything I can do to help.

Thanks, I'll let you know if I need help reproducing, but IIRC these were easy to reproduce.

Fedora no longer supports armv7hl, and I’ve elected to drop i686 support from the sleef package under https://fedoraproject.org/wiki/Changes/EncourageI686LeafRemoval,

Thanks for sharing. What about i386? This is currently missing in our CI, but wondered how important that is.

Mar 05 '24 16:03 blapie

Thanks for sharing. What about i386? This is currently missing in our CI, but wondered how important that is.

We don’t build for true i386. For historical reasons, we sometimes use i386/i686 interchangeably in referring to our 32-bit x86 architecture, which is really i686. As mentioned in https://fedoraproject.org/wiki/Changes/EncourageI686LeafRemoval, there is no longer a 32-bit Fedora kernel or 32-bit release image, and the i686 packages are only used for multilib support on x86_64. (So, to be even more pedantic, they must be running on 64-bit capable hardware, and can assume SSE2 support.)

Anyway, i686 is our only 32-bit primary architecture, and it’s multilib-only as described above. The primary architectures that you can actually install on are x86_64, aarch64, pp64le, and s390x. It seems likely that riscv64 will be added at some point, mostly waiting on the availability of appropriate server-class hardware, but that’s not here yet, and I don’t have access to RISC-V hardware.

Mar 06 '24 13:03 musicinmybrain

So the issue with the iut(y)purec(fma)_scalar tests seem to be related to the -fno-trapping-math option.

The issue appears at -O1 and with gcc12 -fno-trapping-math, and can be reproduced with the following C-code (appearing in first branch of tanf_purec(fma)_scalar).

int q = (int)rintf(d);
float u = (float)q;

Expected result with gcc12 -O1 or gcc11 -O1 -fno-trapping-math or simply at -O0. Codegen on AArch64

 400684:      1e274000       frintx s0, s0
 400688:      5ea1b800       fcvtzs s0, s0
 40068c:      5e21d800       scvtf  s0, s0

Unexpected result with gcc12 -O1 -fno-trapping-math. The conversion have been optimised out and the sign of the output may change if x=-0.0f.

400684:      1e274000       frintx s0, s0

If input is -0.0f, then output is 0.0f in first case, -0.0f in second. Does this sound like a compiler bug or rather does it look like the most recent compiler is doing the right thing? I'm leaning towards compiler bug since the result is different, but I could be missing a point.

It is unclear to me why this option was introduced, removing the option fixes all failures with gcc12 unfortunately it might have an impact on performance. Would it affect vector code though?

Mar 06 '24 18:03 blapie

It is unclear to me why this option was introduced, removing the option fixes all failures with gcc12 unfortunately it might have an impact on performance. Would it affect vector code though?

It was introduced in 33ffaff872511c1f46244533480dc641b0c0b144 via https://github.com/shibatch/sleef/pull/169. Unfortunately, the rationale was never documented. Normally -fno-trapping-math and -fno-math-errno are purely performance optimizations that sacrifice strict IEEE 754 and/or C conformance in exchange for a sometimes significant performance benefit. At least -fno-trapping-math should affect SIMD codegen – usually for the better, since 99.9% of the time nobody is trying to use floating point exceptions and trap handlers.

I’m leaning toward the idea that this is a compiler bug, at least assuming C guarantees (float) 0 is 0.0 and not -0.0, because it seems to violate the “as-if rule” for optimization (given we did not also pass -fno-signed-zeros).

Mar 07 '24 17:03 musicinmybrain

https://gcc.gnu.org/wiki/FloatingPointMath

Mar 07 '24 17:03 musicinmybrain

It was introduced in https://github.com/shibatch/sleef/commit/33ffaff872511c1f46244533480dc641b0c0b144 via https://github.com/shibatch/sleef/pull/169. Unfortunately, the rationale was never documented.

The comment seems to indicate that the aim was to prevent compilers from emitting calls to libm, e.g. sqrt. But only the flag -fno-math-errno is necessary in this case, see https://godbolt.org/z/54zs59hTr .

At least -fno-trapping-math should affect SIMD codegen

You mean NOT affect, right?

I’m leaning toward the idea that this is a compiler bug, at least assuming C guarantees (float) 0 is 0.0 and not -0.0, because it seems to violate the “as-if rule” for optimization (given we did not also pass -fno-signed-zeros).

After discussing with gcc people we have reached a similar conclusion, it looks like a violation of the default -fsigned-zeros.

So we have 2 options here, we could wait for the bug to be fixed (might take some time but safer) or simply remove the option (and deal with minor consequences on scalar performance).

Mar 07 '24 19:03 blapie

Hi! Good news, this was filed as a definite gcc bug and has been taken care of by @joeramsay and people in our GNU team. It is now merged in gcc HEAD and will be backported soon. https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=7dd3b2b09cbeb6712ec680a0445cb0ad41070423

These signed-zero cases often get forgotten, for obvious reasons.

Will close this and #483 once fix is backported and tests pass. Unless there are some leftover tests failing.

Mar 15 '24 09:03 blapie

Thank you! I will keep an eye on this. The GCC package in Fedora tracks upstream pretty closely, so I should be able to test the fix quickly once it’s released. If I have time, I might build my own GCC package with that commit backported just to validate the fix.

Mar 15 '24 12:03 musicinmybrain

Ok, it took more than a day to build the patched GCC in COPR, but it looks like—once it’s released and packaged—I can expect that the GCC change will indeed fix all of the test failures reported in this issue.

Next I’ll double-check Sleef 3.6 and see if all of the test failures from https://github.com/shibatch/sleef/issues/518 are still present with the patched GCC.

Mar 16 '24 23:03 musicinmybrain

Hi! Thanks, we appreciate the effort. Let me know how it goes, if new failures are still present in 3.6 we will investigate each of them to better characterise the failures and open new issues.

Mar 18 '24 14:03 blapie

Using -ftrapping-math does seem to be a usable workaround in 3.5.1 (not sure about 3.6 yet), so I’ll add it to the CXXFLAGS in Fedora and build bugfix updates. I’ll drop -ftrapping-math again once the GCC fix is released.

Mar 18 '24 15:03 musicinmybrain

Indeed it is a workaround, but we were worried about the implications on performance (although scalar performance should not matter). We might drop the -fno-trapping-math in the future as we believe it was introduced for the wrong reasons (only -fno-math-errno was necessary to make sure no call to libm was generate for fsqrt).

Mar 18 '24 16:03 blapie

Indeed it is a workaround, but we were worried about the implications on performance (although scalar performance should not matter).

For now, I’d rather favor strict correctness over maximum scalar performance. In any case, once the GCC fix is released, I’ll drop -ftrapping-math and go back to respecting whatever Sleef chooses upstream.

Mar 18 '24 16:03 musicinmybrain

Plot twist: either -ftrapping-math wasn’t a usable workaround after all, or perhaps it wasn’t correctly placed to override -fno-trapping-math. (I didn’t go back and double-check.) Instead, the test builds with -ftrapping-math worked because the maintainer of the gcc package in Fedora backported the fix yesterday, and I didn’t notice. That was fast!

Mar 18 '24 17:03 musicinmybrain