Speed up pow by 4x or more
Coverage decreased (-0.4%) to 96.667% when pulling 93687808798a062c5800d4d141419a5eb3c35132 on fastpowers into e19764302e41c1b619b2a50727586be4ff90b6c7 on master.
Codecov Report
:exclamation: No coverage uploaded for pull request base (
master@766bef9). Click here to learn what that means. The diff coverage is79.16%.
@@ Coverage Diff @@
## master #210 +/- ##
=========================================
Coverage ? 96.66%
=========================================
Files ? 23
Lines ? 990
Branches ? 0
=========================================
Hits ? 957
Misses ? 33
Partials ? 0
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/IntervalArithmetic.jl | 100% <ø> (ø) |
|
| src/intervals/intervals.jl | 94.73% <ø> (ø) |
|
| src/multidim/intervalbox.jl | 100% <100%> (ø) |
|
| src/intervals/functions.jl | 100% <100%> (ø) |
|
| src/intervals/powers.jl | 70.58% <70.58%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 766bef9...49c8ee4. Read the comment docs.
Tests will fail until pow and ^ are swapped over.
I've taken a look on this PR, and must begin saying that I understand the motivation, but I am not yet convinced that we should sacrifice tightness for speed; see #14.
As it was discussed in #103, current ^ allocates since we must play with big53. The question is if intercepting small powers can help us here?
Regarding the infimum and supremum of an empty interval, see this comment; we are currently following the standard on this, so this shouldn't be changed. Reverting the changes (here and here) in arithmetic.jl allows some tests to pass.
Very nice tricks for fast_sqrt. The question is how does the diameter of Interval(fast_sqrt(a.lo , RoundDown)), fast_sqrt(a.hi , RoundUp)) compare with the results for ^(a, 0.5) and pow(a, 0.5)?
The fast version of sqrt is now in FastRounding and will be used automatically once https://github.com/JuliaIntervals/IntervalArithmetic.jl/pull/230 is merged. The fast version is (supposed to) give exactly the same result as the slow version.
The fast powers in this PR give a result that is "almost" as tight as the terribly slow correctly-rounded version - for higher powers they will be less tight, since more squarings are required. In particular, squares (x^2) will give exactly the same result as now (but repeated squares may start losing a few ulps of accuracy).
I really think it's urgent that we make this change. Tight powers will still be available via pow. We could provide a way for automatically using tight powers instead, but I'm not sure what the API should look like (there are early open issues for this, e.g. #14 that you cited).
I have been using fast powers for a few months now. The difference in speed and allocations is like night and day. There is no question in my mind that we must make fast powers the default (but provide a mechanism to change to tight-but-slow powers).
Tests are failing; I guess those that fail are related to ^, and they should be using pow, right?
Yes, exactly.
That will require changes to itf1788. I think I have those in a branch somewhere.
I guess the same holds; instead of using ^ they should use pow...
Is there a chance of getting this function in master but with another name, eg. fast_pow(a, n)?
It is already there with the name pow, I believe?
I thought this branch gives a faster pow than master but it hasn't been resolved yet because it sacrifices tightness for speed. On master i measure ~1us for Interval(0, 1)^2, so i was assuming that it's using the "slow" version (?).
.. so if that's the case i was just proposing to export the "fast" version, possibly with a different name, keeping the slow one as default.
I get this on master:
julia> using BenchmarkTools, IntervalArithmetic
[ Info: Precompiling IntervalArithmetic [d1acc4aa-44c8-5952-acd4-ba5d80a2a253]
julia> @btime $(Ref(X))[]^2^C
julia> X = 0..1
[0, 1]
julia> @btime $(Ref(X))[]^2
673.264 ns (29 allocations: 1.25 KiB)
[0, 1]
julia> @btime pow($(Ref(X))[], 2)
30.811 ns (0 allocations: 0 bytes)
[0, 1]
But yes you're right, the nonrecursive_powers branch speeds this up by 3x.
Oh, i see what you mean know. I was missing that there's already a difference between pow(a, n) and a^n, and it's in the docs...
But yes you're right, the nonrecursive_powers branch speeds this up by 3x.
Alright, i will keep it in mind. Thanks!
We should definitely update pow to use the fast version, and then provide a way to use that instead of ^.
I agree with @mforets that this should be in master with another name (fast_pow), without touching the current pow for compatibility with the standard.
@dpsanders Do you have time to address the conflicts with the files currently in master?
Yes we should definitely incorporate the latest version into base in the pow function.
Solved in PR #593.