Forward difference gradient has the same number of function calls as central difference
Changes to gradients.jl made in fc5a08e removed any way to reduce the number of function calls made when using forward difference for gradients as compared to central difference. The finite_difference_gradient! functions assume that f(x) will be provided in the GradientCache for forward differences, but the only constructor for GradientCache forces fx = nothing.
Here are the lines returning the only cache the constructor will give you: https://github.com/JuliaDiff/FiniteDiff.jl/blob/9945e7d7f567dc2eb6a95a714b1de9782d0ecea4/src/gradients.jl#L52-L53
I can think of three ways to fix it:
- There could be a constructor that accepts an input of
fxreturns the a cache with fx, although fc5a08e was trying to get rid of extra constructors. - Instead of using the type
Nothingfor the first parameter of the returned GradientCache, useUnion{Nothing,returntype}so the user can manually update it later. - Or, don't bother using the cache and change the following line
https://github.com/JuliaDiff/FiniteDiff.jl/blob/9945e7d7f567dc2eb6a95a714b1de9782d0ecea4/src/gradients.jl#L138
to
_fx, c1, c2, c3 = cache.fx, cache.c1, cache.c2, cache.c3and then before this for loop: https://github.com/JuliaDiff/FiniteDiff.jl/blob/9945e7d7f567dc2eb6a95a714b1de9782d0ecea4/src/gradients.jl#L144-L145 add something along the lines offx = _fx == Nothing ? f(x) : _fx
2 makes the most sense to me, but I'm not sure what the performance implications are.
I would prefer not to do 2. Union splitting isn't too bad for runtime performance, but can sneakily lead to more compile times. I think we need to do (1) and add fx to the cache.