Relax the required accuracy of tan(half)?
Since the working group has recently reduced the required accuracy of 1 / half, I'd like to request relaxing the required accuracy of tan(half).
My implementation fails bruteforce on tan(half) like this:
ERROR: tan: 2.248892 ulp error at 0x1.2e8p+14 (half 0x74ba) Expected: 0x1.edcp+3 (half 0x4bb7) Actual: 0x1.ee4p+3 (half 0x4bb9)
This 2.25ULP error is the largest of my implementation for all half inputs.
Consider these steps (0x1.2e8p+14 == 19360)
// Reduce argument 19360 - 12325 * pi / 2 = -0.064727747100832...
// Compute result tan(-0.064727747100832...) = -0.064818295060042395... -1 / -0.064818295060042395... = 15.4277430326373342225222...
// Exact result tan(19360) = 15.4277430326373342225222...
// Rounding to half round(-0.064727747100832...) = -0x1.09p-4 tan(-0x1.09p-4) = -0.0647876855794585216773... which rounds to -0x1.094p-4 -1 / -0x1.094p-4 = 15.4420358152686145146... which rounds to 0x1.ee4p+3 (see actual)
So, even with a correctly rounded reduced argument, correctly rounded tan(), and correctly rounded reciprocal, we end up with a relative error of 2.25ULP.
Since we would like to carry out as much of the tan(half) implementation in half precision as possible, we're requesting lifting the required accuracy of tan(half) to 2.25 ULP.
Independent verification of the algorithm: https://www.wolframalpha.com/input?i=tan%2819360%29+%2B+1%2Ftan%2819360+-+12325*Pi%2F2%29
We are still discussing this internally and should have some feedback in a couple weeks.
Qualcomm is okay with the accuracy for tan(half) being relaxed to 2.25 ULP
Are any opposed to making this change?
I added the agenda label hoping to get final approval of this change.
Linking with #1373, which is at least slightly related since it has to do with the required accuracy for the fp16 sqrt.
I created https://github.com/KhronosGroup/OpenCL-Docs/pull/1387, which makes this relaxation, which might help to get discussion moving.
Note, we would still need a CTS change as well.