Vectorization of tanh
Recently I submitted a PR to the PyTorch Repo for a vectorized tanh implementation for single precision. The implementation is a vectorized version of cephes math library's single precision tanhf function. In PyTorch setting the implementation seemed faster than Sleef_tanhf8_u10 (I have posted some benchmark numbers in the PR here).
Are there any Sleef benchmarks that I can run to compare the implementation? In case it is faster are you open to a PR? Thanks!
Hello @vedanuj, Thank you for considering contribution. I'm open to a PR if your implementation is good enough. However, I cannot confirm if your implementation is good enough to adopt. Please consider checking the following points.
- Is it an alternative to Sleef_tanhf8_u10? If so, please make sure that it's error is less than 1 ULP. It seems that you checked the correctness of your subroutine using a utility included in PyTorch, and it only took less than 1 second to check? That's not enough to check if the maximum error is less than the specified number. Please use tester2 included in libm-tester directory. Of course, you can use your own utility to check the maximum error.
- Don't you have a double-precision implementation?
- You also need to write the code using helper functions, like other functions in SLEEF.
I am now trying to implement 3.5-ULP versions of hyperbolic functions.