Certain tests fail on gcc 12.0.0 pre-release
Fedora Linux just introduced a pre-release build of gcc 12.0.0 to the development version of Fedora (Rawhide). (Fedora 36 will be released with GCC 12 in mid-2022.) This has resulted in a few new test failures in the sleef package on several architectures.
The release notes for GCC 12 are here. It’s not obvious whether these new failures are a Sleef bug or a GCC bug. Disabling LTO makes no difference.
armv7hl
OK (no failures)
i686
OK (no failures)
x86_64
The following tests FAILED:
28 - iutpurecfma_scalar (Failed)
29 - iutypurecfma_scalar (Failed)
aarch64
The following tests FAILED:
14 - iutpurec_scalar (Failed)
15 - iutypurec_scalar (Failed)
17 - iutpurecfma_scalar (Failed)
18 - iutypurecfma_scalar (Failed)
ppc64le
The following tests FAILED:
8 - iutpurec_scalar (Failed)
9 - iutypurec_scalar (Failed)
11 - iutpurecfma_scalar (Failed)
12 - iutypurecfma_scalar (Failed)
s390x
The following tests FAILED:
3 - iutyzvector2 (Failed)
6 - iutyzvector2nofma (Failed)
8 - iutpurec_scalar (Failed)
9 - iutypurec_scalar (Failed)
11 - iutpurecfma_scalar (Failed)
12 - iutypurecfma_scalar (Failed)
Updated with ppc64le results.
I re-tested this with Sleef 3.5.1 and GCC 14.0.1. I see some regressions in Sleef 3.6 (bug report to follow), and I wanted to make sure I still had a correct baseline to compare against. Note that I am not building the DFT library due to https://github.com/shibatch/sleef/issues/214, and I am not building the quad-precision library since it is still marked experimental in 3.5.1. (If possible, I plan to enable it for 3.6.)
Fedora no longer supports armv7hl, and I’ve elected to drop i686 support from the sleef package under https://fedoraproject.org/wiki/Changes/EncourageI686LeafRemoval, so this is just tested on the four 64-bit primary architectures: x86_64, aarch64, ppc64le, and s390x. I am still building with LTO; disabling LTO didn’t help when I made the original report, so I didn’t try it again.
All of the test failures on each architecture are still exactly the same as in the original report.
I don’t really know how to debug these, but please let me know if there’s anything I can do to help.
Again thanks for the detailed report! AFAICT your baseline sounds sensible, these bugs have not been fixed yet unfortunately. I'm going to look into these failures right now.
I don’t really know how to debug these, but please let me know if there’s anything I can do to help.
Thanks, I'll let you know if I need help reproducing, but IIRC these were easy to reproduce.
Fedora no longer supports armv7hl, and I’ve elected to drop i686 support from the sleef package under https://fedoraproject.org/wiki/Changes/EncourageI686LeafRemoval,
Thanks for sharing. What about i386? This is currently missing in our CI, but wondered how important that is.
Thanks for sharing. What about i386? This is currently missing in our CI, but wondered how important that is.
We don’t build for true i386. For historical reasons, we sometimes use i386/i686 interchangeably in referring to our 32-bit x86 architecture, which is really i686. As mentioned in https://fedoraproject.org/wiki/Changes/EncourageI686LeafRemoval, there is no longer a 32-bit Fedora kernel or 32-bit release image, and the i686 packages are only used for multilib support on x86_64. (So, to be even more pedantic, they must be running on 64-bit capable hardware, and can assume SSE2 support.)
Anyway, i686 is our only 32-bit primary architecture, and it’s multilib-only as described above. The primary architectures that you can actually install on are x86_64, aarch64, pp64le, and s390x. It seems likely that riscv64 will be added at some point, mostly waiting on the availability of appropriate server-class hardware, but that’s not here yet, and I don’t have access to RISC-V hardware.
So the issue with the iut(y)purec(fma)_scalar tests seem to be related to the -fno-trapping-math option.
The issue appears at -O1 and with gcc12 -fno-trapping-math, and can be reproduced with the following C-code (appearing in first branch of tanf_purec(fma)_scalar).
int q = (int)rintf(d);
float u = (float)q;
Expected result with gcc12 -O1 or gcc11 -O1 -fno-trapping-math or simply at -O0. Codegen on AArch64
400684: 1e274000 frintx s0, s0
400688: 5ea1b800 fcvtzs s0, s0
40068c: 5e21d800 scvtf s0, s0
Unexpected result with gcc12 -O1 -fno-trapping-math. The conversion have been optimised out and the sign of the output may change if x=-0.0f.
400684: 1e274000 frintx s0, s0
If input is -0.0f, then output is 0.0f in first case, -0.0f in second. Does this sound like a compiler bug or rather does it look like the most recent compiler is doing the right thing? I'm leaning towards compiler bug since the result is different, but I could be missing a point.
It is unclear to me why this option was introduced, removing the option fixes all failures with gcc12 unfortunately it might have an impact on performance. Would it affect vector code though?
It is unclear to me why this option was introduced, removing the option fixes all failures with gcc12 unfortunately it might have an impact on performance. Would it affect vector code though?
It was introduced in 33ffaff872511c1f46244533480dc641b0c0b144 via https://github.com/shibatch/sleef/pull/169. Unfortunately, the rationale was never documented. Normally -fno-trapping-math and -fno-math-errno are purely performance optimizations that sacrifice strict IEEE 754 and/or C conformance in exchange for a sometimes significant performance benefit. At least -fno-trapping-math should affect SIMD codegen – usually for the better, since 99.9% of the time nobody is trying to use floating point exceptions and trap handlers.
I’m leaning toward the idea that this is a compiler bug, at least assuming C guarantees (float) 0 is 0.0 and not -0.0, because it seems to violate the “as-if rule” for optimization (given we did not also pass -fno-signed-zeros).
https://gcc.gnu.org/wiki/FloatingPointMath
It was introduced in https://github.com/shibatch/sleef/commit/33ffaff872511c1f46244533480dc641b0c0b144 via https://github.com/shibatch/sleef/pull/169. Unfortunately, the rationale was never documented.
The comment seems to indicate that the aim was to prevent compilers from emitting calls to libm, e.g. sqrt. But only the flag -fno-math-errno is necessary in this case, see https://godbolt.org/z/54zs59hTr .
At least -fno-trapping-math should affect SIMD codegen
You mean NOT affect, right?
I’m leaning toward the idea that this is a compiler bug, at least assuming C guarantees (float) 0 is 0.0 and not -0.0, because it seems to violate the “as-if rule” for optimization (given we did not also pass -fno-signed-zeros).
After discussing with gcc people we have reached a similar conclusion, it looks like a violation of the default -fsigned-zeros.
So we have 2 options here, we could wait for the bug to be fixed (might take some time but safer) or simply remove the option (and deal with minor consequences on scalar performance).
Hi! Good news, this was filed as a definite gcc bug and has been taken care of by @joeramsay and people in our GNU team. It is now merged in gcc HEAD and will be backported soon. https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=7dd3b2b09cbeb6712ec680a0445cb0ad41070423
These signed-zero cases often get forgotten, for obvious reasons.
Will close this and #483 once fix is backported and tests pass. Unless there are some leftover tests failing.
Thank you! I will keep an eye on this. The GCC package in Fedora tracks upstream pretty closely, so I should be able to test the fix quickly once it’s released. If I have time, I might build my own GCC package with that commit backported just to validate the fix.
Ok, it took more than a day to build the patched GCC in COPR, but it looks like—once it’s released and packaged—I can expect that the GCC change will indeed fix all of the test failures reported in this issue.
Next I’ll double-check Sleef 3.6 and see if all of the test failures from https://github.com/shibatch/sleef/issues/518 are still present with the patched GCC.
Hi! Thanks, we appreciate the effort. Let me know how it goes, if new failures are still present in 3.6 we will investigate each of them to better characterise the failures and open new issues.
Using -ftrapping-math does seem to be a usable workaround in 3.5.1 (not sure about 3.6 yet), so I’ll add it to the CXXFLAGS in Fedora and build bugfix updates. I’ll drop -ftrapping-math again once the GCC fix is released.
Indeed it is a workaround, but we were worried about the implications on performance (although scalar performance should not matter).
We might drop the -fno-trapping-math in the future as we believe it was introduced for the wrong reasons (only -fno-math-errno was necessary to make sure no call to libm was generate for fsqrt).
Indeed it is a workaround, but we were worried about the implications on performance (although scalar performance should not matter).
For now, I’d rather favor strict correctness over maximum scalar performance. In any case, once the GCC fix is released, I’ll drop -ftrapping-math and go back to respecting whatever Sleef chooses upstream.
Plot twist: either -ftrapping-math wasn’t a usable workaround after all, or perhaps it wasn’t correctly placed to override -fno-trapping-math. (I didn’t go back and double-check.) Instead, the test builds with -ftrapping-math worked because the maintainer of the gcc package in Fedora backported the fix yesterday, and I didn’t notice. That was fast!