Issues with differentiating with respect to polynomial coefficients
I am interested in taking the derivative of a polynomial with respect to its coefficients. When the coefficients are such that all terms of order $n \ge k$ for some $k$ are 0, automatic differentiation packages return 0 as the derivative with respect to those coefficients. Of course, the true derivative of $\beta_0 + \beta_1 x + \cdots + \beta_K x^K$ with respect to $\beta_n$ should be $x^n$.
Here is an example:
using Polynomials
using ForwardDiff
f(x) = Polynomial(x)(0.5)
∇f(x) = ForwardDiff.gradient(f, x)
true_∇f(x) = 0.5.^(0:length(x)-1)
all_params = [[1.0, 2.0, 3.0], [1.0, 0.0, 3.0], [1.0, 0.0, 0.0]]
for param in all_params
println("Params are $(param).")
println("Gradient from ForwardDiff: $(∇f(param))")
println("True gradient: $(true_∇f(param))")
println("")
end
The output from this example is below. Note the issue in the final example.
Params are [1.0, 2.0, 3.0].
Gradient from ForwardDiff: [1.0, 0.5, 0.25]
True gradient: [1.0, 0.5, 0.25]
Params are [1.0, 0.0, 3.0].
Gradient from ForwardDiff: [1.0, 0.5, 0.25]
True gradient: [1.0, 0.5, 0.25]
Params are [1.0, 0.0, 0.0].
Gradient from ForwardDiff: [1.0, 0.0, 0.0]
True gradient: [1.0, 0.5, 0.25]
Is there a way to avoid this issue?
This issue can cause hard to find silent bugs (as I just painfully discovered). In my case, I have a nonlinear function $f(X)$, which looks roughly like
function f(X)
Ŷ, S = ComplicatedStuff(X)
Z = ChebyshevT(Ŷ).(S)
return MoreComplicatedStuff(X, Z)
end
and I'm looking at the Jacobian at an equilibrium, where $f(X)=0$, which in some cases means that $\hat Y$ is exactly zero. The truncation-on-construction behavior then leads to an incorrect Jacobian. Once the bug was found this was easy enough to fix, but it was quite a headache to find.
For the simple case in the above post (and for my use case), lightly modifying the constructor of MutableDensePolynomial is sufficient to give correct results:
@inline my_iszero(x) = iszero(x)
@inline my_iszero(x::ForwardDiff.Dual) = false
struct MutableDensePolynomial{B,T,X} <: AbstractDenseUnivariatePolynomial{B,T,X}
coeffs::Vector{T}
function MutableDensePolynomial{B,T,X}(::Val{true}, cs::AbstractVector{S}, order::Int=0) where {B,T,X,S}
if Base.has_offset_axes(cs)
@warn "ignoring the axis offset of the coefficient vector"
cs = parent(cs)
end
i = findlast(!my_iszero, cs)
if i == nothing
xs = T[]
else
xs = T[cs[i] for i ∈ 1:i] # make copy
end
if order > 0
prepend!(xs, zeros(T, order))
end
new{B,T,Symbol(X)}(xs)
end
function MutableDensePolynomial{B,T,X}(check::Val{false}, cs::AbstractVector{T}, order::Int=0) where {B,T,X}
new{B,T,Symbol(X)}(cs)
end
end
The only change here is altering the line i = findlast(!iszero, cs) to i = findlast(!my_iszero, cs), along with the definition of my_iszero, which basically says "don't truncate if you're differentiating". If truncation is implemented this way everywhere (and I don't know the code for this package at all), then this would be a relatively easy fix for ForwardDiff but requires ForwardDiff as a dependency.