Forward difference gradient has the same number of function calls as central difference

Open benide opened this issue 3 years ago • 1 comments

Changes to gradients.jl made in fc5a08e removed any way to reduce the number of function calls made when using forward difference for gradients as compared to central difference. The finite_difference_gradient! functions assume that f(x) will be provided in the GradientCache for forward differences, but the only constructor for GradientCache forces fx = nothing.

Here are the lines returning the only cache the constructor will give you: https://github.com/JuliaDiff/FiniteDiff.jl/blob/9945e7d7f567dc2eb6a95a714b1de9782d0ecea4/src/gradients.jl#L52-L53

I can think of three ways to fix it:

There could be a constructor that accepts an input of fx returns the a cache with fx, although fc5a08e was trying to get rid of extra constructors.
Instead of using the type Nothing for the first parameter of the returned GradientCache, use Union{Nothing,returntype} so the user can manually update it later.
Or, don't bother using the cache and change the following line https://github.com/JuliaDiff/FiniteDiff.jl/blob/9945e7d7f567dc2eb6a95a714b1de9782d0ecea4/src/gradients.jl#L138 to _fx, c1, c2, c3 = cache.fx, cache.c1, cache.c2, cache.c3 and then before this for loop: https://github.com/JuliaDiff/FiniteDiff.jl/blob/9945e7d7f567dc2eb6a95a714b1de9782d0ecea4/src/gradients.jl#L144-L145 add something along the lines of fx = _fx == Nothing ? f(x) : _fx

2 makes the most sense to me, but I'm not sure what the performance implications are.

Mar 14 '22 17:03 benide

I would prefer not to do 2. Union splitting isn't too bad for runtime performance, but can sneakily lead to more compile times. I think we need to do (1) and add fx to the cache.

Mar 15 '22 01:03 ChrisRackauckas