FiniteDifferences.jl icon indicating copy to clipboard operation
FiniteDifferences.jl copied to clipboard

jvp incorrect result rather than erroring

Open MasonProtter opened this issue 5 years ago • 10 comments

There appears to be a problem with jvp when the v supplied is real and the primal is complex:

using FiniteDifferences
jvp(central_fdm(5,1), abs2, (3.0 + im, 0.25))

#+RESULTS:
: 1.9999999999995874
jvp(central_fdm(5,1), abs2, (3.0 + im, 0.25 + 0im))

#+RESULTS:
: 1.4999999999999445

The correct, answer here is 1.5. This should produce an error.

MasonProtter avatar Jun 24 '20 17:06 MasonProtter

Continuing the discussion from https://github.com/JuliaDiff/ChainRules.jl/pull/196#issuecomment-648746715:

Whether or not it plays nicely with Unitfuls out of the box is an implementation detail.

Well it's part of the interface, so not really an implementation detail.

The current design requires that any type can be mapped to a plain Vector{<:Real}, because we can always add them and scale them.

Isn't converting Complex to Vector{Real} a real drag on performance?

The alternative approach would be to assume that any tangent vector can be scaled and added to the corresponding vector.

I don't understand. Why do we need to add tangent vectors to vectors / primals? The definition

lim_{h -> 0} ( f(x + Δx*h) - f(x) ) / h

requires only scaling and addition in the domain and range. Furthermore, I don't see how we could reasonably define derivatives without these properties.

Long story short, I would prefer seeing this issue fixed by making

jvp(central_fdm(5,1), abs2, (3.0 + im, 0.25))

return 1.5 rather than throwing an error. That would make the interface consistent with https://github.com/JuliaDiff/ChainRules.jl/pull/196.

Of course, I'm just a random guy on the internet who has not contributed any code to this package yet, so please apologise if my opinion is unsolicited.

ettersi avatar Jun 26 '20 01:06 ettersi

Of course, I'm just a random guy on the internet who has not contributed any code to this package yet, so please apologise if my opinion is unsolicited.

Your opinion is very much valued. Please keep pointing out where you think that things could be improved :)

requires only scaling and addition in the domain and range. Furthermore, I don't see how we could reasonably define derivatives without these properties.

I agree. The way that to_vec implements this is by constructing an isomorphism between vectors / tangents and Vectors. FiniteDifferences then operates directly on Vectors, and applies the inverse mapping at the end to get back to the original types.

I would love to see FiniteDifferences do the correct thing here -- that we have the to_vec function is something of a legacy feature that could probably do with being updated in light of the types in ChainRules. As you point out, all that we need to be able to do to compute jvps / pushforwards is

  1. add tangents to primals
  2. scale tangents by scalars
  3. subtract one primal from another, divide through by a scalar, and spit out a tangent.

If we use ChainRules types to represent said tangents, the first two items are free. The third would require some work though.

Pullbacks / gradients / Jacobians are a bit slightly trickier, and we don't currently have the code to implement this properly. To do this we would need some mechanism for constructing the standard basis for any type (inc. structs etc), which requires some thought but is definitely doable. We would also need a way to apply cotangents to tangents -- this is arguably a missing feature of ChainRules that we could either resolve by introducing a collection of cotangent types / wrapper types, or by defining the inner product between tangents and failing to distinguish between tangents and cotangents. There are probably some other details that I'm brushing over.

Either way I would like to see this resolved because more and more I'm finding that to_vec is becoming burdensome and limiting the kinds of things that we can easily test with FiniteDifferences e.g. frules and rrules of higher-order functions.

willtebbutt avatar Jun 26 '20 09:06 willtebbutt

If we use ChainRules types to represent said tangents, the first two items are free. The third would require some work though.

Can you provide an example where the third item is indeed tricky to implement? As far as I can tell, finite differences only makes sense if both the domain and range are affine spaces, and in this case there is no problem with Item 3.

ettersi avatar Jun 26 '20 10:06 ettersi

Can you provide an example where the third item is indeed tricky to implement? As far as I can tell, finite differences only makes sense if both the domain and range are affine spaces, and in this case there is no problem with Item 3.

No conceptual difficulty, just tricky in the sense that you would need to implement stuff to actually do this. Not all of the types that we would be working with placing nicely with scaling / subtraction (e.g. arbitrary structs) so we would need to roll our own functionality to do that. Were we to do this I imagine we would implement a function estimate_tangent(f, h, x, dx) that rolls all of these things into a single operation, so we can explicitly provide methods for this for individual types that don't naturally support the operations that we need and we're unable to define them for due to type-piracy restrictions.

willtebbutt avatar Jun 26 '20 10:06 willtebbutt

Not all of the types that we would be working with placing nicely with scaling / subtraction (e.g. arbitrary structs) so we would need to roll our own functionality to do that.

Or FiniteDifferences could just refuse to work with such types on the grounds that if they do not implement the interface of an affine vector space, then there is no sensible definition of finite differences.

ettersi avatar Jun 26 '20 10:06 ettersi

Pullbacks / gradients / Jacobians are a bit slightly trickier, and we don't currently have the code to implement this properly. To do this we would need some mechanism for constructing the standard basis for any type (inc. structs etc).

One way to resolve this would be to 1) fix a generic interface for dual vectors / cotangents and 2) require domain types to implement their own methods for vjp.

Examples for 1), where v denotes a vector and f a dual vector:

  • f(v) (this emphasises the linear functional aspect of dual vectors)
  • f*v (this emulates the f'*v notation from R^n)
  • dot(f,v) (this emulates the notation for inner product spaces)

Out of these three, I would choose dot(f,v) since it is already defined and does the right thing for most types.

Examples for 2), assuming the dot(f,v) notation for applying duals to vectors:

vjp(fdm,f,ȳ,x::Real) = dot(ȳ,jvp(fdm,f,x,1))
vjp(fdm,f,ȳ,x::Complex) = dot(ȳ,jvp(fdm,f,x,1)) + im*dot(ȳ,jvp(fdm,f,x,im))
vjp(fdm,f,ȳ,x::Vector{<:Real}) = [dot(ȳ,jvp(fdm,f,unit(x,i))) for i = 1:length(x)]
unit(x,i) = (e = zeros(length(x)); e[i] = 1; e)

ettersi avatar Jun 26 '20 11:06 ettersi

Or FiniteDifferences could just refuse to work with such types on the grounds that if they do not implement the interface of an affine vector space, then there is no sensible definition of finite differences.

I don't think that this would be a good solution. For example, it would preclude the use of Tuples.

Out of these three, I would choose dot(f,v) since it is already defined and does the right thing for most types.

I agree.

Examples for 2), assuming the dot(f,v) notation for applying duals to vectors:

I like this proposal a lot.

This would just leave the question of how to represent Jacobians. Perhaps a collection of tangents, one for each input, would make most sense? And perhaps this could be specialised to a matrix in the Vector case? To be honest I'm really not sure whether people are actually using this package for that kind of functionality.

I do wonder whether we really need to be able to compute vjps / gradients at all though. If one accepts that the sole purpose of FiniteDifferences is testing AD, then you really just need the ability to compute jvps (to test forwards-mode) and to verify that

_, pullback = rrule(f, x)
dot(ȳ, jvp(fdm, f, x, ẋ)) ≈ dot(pullback(ȳ), ẋ)

to test reverse-mode. Perhaps this is a step too far though. It's entirely possible that people are using this package to compute gradients, so losing that functionality would make for a bad user experience.

willtebbutt avatar Jun 26 '20 11:06 willtebbutt

It's entirely possible that people are using this package to compute gradients, so losing that functionality would make for a bad user experience.

Manifolds.jl's default "AD" backend is FiniteDifferences, which it uses in internal functions to compute the gradient. It's placeholder functionality though and currently not used for any higher order functions. 🤷‍♂️

sethaxen avatar Jun 26 '20 18:06 sethaxen

Or FiniteDifferences could just refuse to work with such types on the grounds that if they do not implement the interface of an affine vector space, then there is no sensible definition of finite differences.

I don't think that this would be a good solution. For example, it would preclude the use of Tuples.

Tuples could be fixed either by wrapping them in an SVector-like struct or by implementing the vector arithmetic using .+ and .* instead of + and *. The latter would be more efficient even for AbstractArrays.

ettersi avatar Jun 28 '20 15:06 ettersi

Certainly wrapper types would work as alternative to implementing a special function.

.+ and .* etc would still be type piracy though I believe.

willtebbutt avatar Jun 28 '20 20:06 willtebbutt