Issues with differentiating with respect to polynomial coefficients

Open vivekbhattacharya opened this issue 1 year ago • 1 comments

I am interested in taking the derivative of a polynomial with respect to its coefficients. When the coefficients are such that all terms of order $n \ge k$ for some $k$ are 0, automatic differentiation packages return 0 as the derivative with respect to those coefficients. Of course, the true derivative of $\beta_0 + \beta_1 x + \cdots + \beta_K x^K$ with respect to $\beta_n$ should be $x^n$.

Here is an example:

using Polynomials
using ForwardDiff

f(x) = Polynomial(x)(0.5)
∇f(x) = ForwardDiff.gradient(f, x)
true_∇f(x) = 0.5.^(0:length(x)-1)

all_params = [[1.0, 2.0, 3.0], [1.0, 0.0, 3.0], [1.0, 0.0, 0.0]]
for param in all_params
    println("Params are $(param).") 
    println("Gradient from ForwardDiff: $(∇f(param))") 
    println("True gradient: $(true_∇f(param))")
    println("")
end

The output from this example is below. Note the issue in the final example.

Params are [1.0, 2.0, 3.0].
Gradient from ForwardDiff: [1.0, 0.5, 0.25]
True gradient: [1.0, 0.5, 0.25]

Params are [1.0, 0.0, 3.0].
Gradient from ForwardDiff: [1.0, 0.5, 0.25]
True gradient: [1.0, 0.5, 0.25]

Params are [1.0, 0.0, 0.0].
Gradient from ForwardDiff: [1.0, 0.0, 0.0]
True gradient: [1.0, 0.5, 0.25]

Is there a way to avoid this issue?

Jun 27 '24 22:06 vivekbhattacharya

This issue can cause hard to find silent bugs (as I just painfully discovered). In my case, I have a nonlinear function $f(X)$, which looks roughly like

function f(X)
    Ŷ, S = ComplicatedStuff(X)
    Z = ChebyshevT(Ŷ).(S)
    return MoreComplicatedStuff(X, Z)
end

and I'm looking at the Jacobian at an equilibrium, where $f(X)=0$, which in some cases means that $\hat Y$ is exactly zero. The truncation-on-construction behavior then leads to an incorrect Jacobian. Once the bug was found this was easy enough to fix, but it was quite a headache to find.

For the simple case in the above post (and for my use case), lightly modifying the constructor of MutableDensePolynomial is sufficient to give correct results:

@inline my_iszero(x) = iszero(x)
@inline my_iszero(x::ForwardDiff.Dual) = false

struct MutableDensePolynomial{B,T,X} <: AbstractDenseUnivariatePolynomial{B,T,X}
    coeffs::Vector{T}
    function MutableDensePolynomial{B,T,X}(::Val{true}, cs::AbstractVector{S}, order::Int=0) where {B,T,X,S}
        if Base.has_offset_axes(cs)
            @warn "ignoring the axis offset of the coefficient vector"
            cs = parent(cs)
        end
        i = findlast(!my_iszero, cs)
        if i == nothing
            xs = T[]
        else
            xs = T[cs[i] for i ∈ 1:i] # make copy
        end
        if order > 0
            prepend!(xs, zeros(T, order))
        end
        new{B,T,Symbol(X)}(xs)

    end
    function MutableDensePolynomial{B,T,X}(check::Val{false}, cs::AbstractVector{T}, order::Int=0) where {B,T,X}
        new{B,T,Symbol(X)}(cs)
    end
end

The only change here is altering the line i = findlast(!iszero, cs) to i = findlast(!my_iszero, cs), along with the definition of my_iszero, which basically says "don't truncate if you're differentiating". If truncation is implemented this way everywhere (and I don't know the code for this package at all), then this would be a relatively easy fix for ForwardDiff but requires ForwardDiff as a dependency.

Dec 18 '24 22:12 dbstein