Different test results using Clang when enabling Debug or not on targer RVV
When using clang@d70267fb with highway@e9a2799, cmake with -DCMAKE_BUILD_TYPE=Debug option gives different test results between without this option.
Test results without Debug option:
99% tests passed, 4 tests failed out of 684
Total Test time (real) = 7.99 sec
The following tests FAILED:
249 - HwyDemoteTestGroup/HwyDemoteTest.TestAllDemoteToFloat/RVV # GetParam() = 137438953472 (Failed)
283 - HwyFloatTestGroup/HwyFloatTest.TestAllCeil/RVV # GetParam() = 137438953472 (Failed)
285 - HwyFloatTestGroup/HwyFloatTest.TestAllFloor/RVV # GetParam() = 137438953472 (Failed)
651 - SortTestGroup/SortTest.TestAllPartition/RVV # GetParam() = 137438953472 `(Failed)`
Errors while running CTest
Test results with Debug option:
98% tests passed, 12 tests failed out of 684
Total Test time (real) = 25.01 sec
The following tests FAILED:
249 - HwyDemoteTestGroup/HwyDemoteTest.TestAllDemoteToFloat/RVV # GetParam() = 137438953472 (Failed)
571 - MatVecTestGroup/MatVecTest.TestAllMatVecBF16/RVV # GetParam() = 137438953472 (Failed)
645 - SortTestGroup/SortTest.TestAllFloatInf/RVV # GetParam() = 137438953472 (Failed)
651 - SortTestGroup/SortTest.TestAllPartition/RVV # GetParam() = 137438953472 (Failed)
655 - SortTestGroup/SortTest.TestAllSort/RVV # GetParam() = 137438953472 (Failed)
656 - SortTestGroup/SortTest.TestAllSort/EMU128 # GetParam() = 2305843009213693952 (Failed)
657 - SortTestGroup/SortTest.TestAllSelect/RVV # GetParam() = 137438953472 (Failed)
658 - SortTestGroup/SortTest.TestAllSelect/EMU128 # GetParam() = 2305843009213693952 (Failed)
659 - SortTestGroup/SortTest.TestAllPartialSort/RVV # GetParam() = 137438953472 (Failed)
660 - SortTestGroup/SortTest.TestAllPartialSort/EMU128 # GetParam() = 2305843009213693952 (Failed)
663 - BenchSortGroup/BenchSort.BenchAllSort/RVV # GetParam() = 137438953472 (Failed)
664 - BenchSortGroup/BenchSort.BenchAllSort/EMU128 # GetParam() = 2305843009213693952 (Failed)
Errors while running CTest
When digging into a more sipecific task, MatVecTest.TestAllMatVecBF16/RVV, on line:
https://github.com/google/highway/blob/4852c6f356fb678a0e6af11151b25981278fa1c6/hwy/contrib/matvec/matvec_test.cc#L171-L174
With Debug, the actual would be -1.993652, resulting a negtive tolerance. But without Debug, all data would be positive so the test is fine.
i16/f32 6 x 8, with add: mismatch at 4 -1.993652 -1.993652; tol -0.311508
And in SortTest, num would be 24 and Constants::SampleLanes<T>() would be 32.
Abort at vqsort-inl.h:1208: Assert num >= Constants::SampleLanes<T>()
Thanks for reporting. We have also seen issues with rounding mode on QEMU - is that how you are running the tests, or is it on real HW?
Debug, the actual would be -1.993652
Interesting. GenerateMod does, or should, generate numbers 0..15. Can you help us understand where the negative numbers come from? Would be good to also add an assert that inputs and outputs are non-negative.
For SortTest, the comment there says: "We have at least 2 chunks (x 64 bytes) because the base case handles anything up to 8 vectors (x 16 bytes)." It seems possible that this is breaking with LMUL<1. This is only 'breaking' in debug mode because it's a DASSERT which is only active in debug builds. Can you print N and d.Pow2() at the failing DASSERT?
There were bugs in RVV F64->F32 and F32->F16 DemoteTo, which are fixed in pull request #2164.
RVV Ceil and Floor have also been reimplemented in pull request #2164 to avoid changing the floating point rounding mode using inline assembly, which fixes issues with Ceil and Floor on RVV on Clang 16 and later.
I think the issue is solved, thanks @johnplatts :)