SparseDiffTools.jl icon indicating copy to clipboard operation
SparseDiffTools.jl copied to clipboard

Hessian Vector Product Operator

Open RS-Coop opened this issue 4 years ago • 2 comments

Using the existing hvp operator and the forward over back AD hvp function in this repository I have made the following modified hvp operator.

mutable struct HvpOperator{F, T, I}
	f::F
	x::Vector{T}
	dual_cache1::Vector{Dual{Nothing, T, 1}}
	dual_cache2::Vector{Dual{Nothing, T, 1}}
	size::I
	nprod::UInt16
end

function HvpOperator(f, x::AbstractVector)
	cache1 = Dual.(x, similar(x))
	cache2 = Dual.(x, similar(x))
	return HvpOperator(f, x, cache1, cache2, size(x, 1), UInt16(0))
end

Base.eltype(op::HvpOperator{F, T, I}) where{F, T, I} = T
Base.size(op::HvpOperator) = (op.size, op.size)

function LinearAlgebra.mul!(result::AbstractVector, op::HvpOperator, v::AbstractVector)
	op.nprod += 1

	g = (∇, x) -> ∇ .= gradient(op.f, x)[1]
	op.dual_cache1 .= Dual.(op.x, v)
	result .= partials.(g(op.dual_cache2, op.dual_cache1), 1)
end

I have a couple of questions related to this:

  1. It seems that it would be possible to further optimize this by dropping the field for x, and instead just updating partials field of dual_cache1. I can't seem to figure out the process for doing this, is it possible?
  2. Are we saving anything by using dual_cache2 when the Zygote gradient operation cannot be performed in-place?
  3. Can further specification of types improve performance? For instance, I know that the function type F could be further specified as taking a vector as input and returning a scalar.
  4. Slightly off topic from the other questions, is the symmetry of the Hessian automatically being exploited by the AD tools, or does that even make sense?

Thanks for the help!

RS-Coop avatar Jan 07 '22 20:01 RS-Coop

It seems that it would be possible to further optimize this by dropping the field for x, and instead just updating partials field of dual_cache1. I can't seem to figure out the process for doing this, is it possible?

It's possible. See some of the tricks in the Jacobian-vector product code.

Are we saving anything by using dual_cache2 when the Zygote gradient operation cannot be performed in-place?

Nope. That's more for when we get around to doing this with ReverseDiff.jl or Enzyme.jl

Can further specification of types improve performance? For instance, I know that the function type F could be further specified as taking a vector as input and returning a scalar.

Nope. But fixing the tag types would be good. It's not good to tag nothing.

Slightly off topic from the other questions, is the symmetry of the Hessian automatically being exploited by the AD tools, or does that even make sense?

It is. There's a step at the end of a Hessian calculation which uses it.

ChrisRackauckas avatar Jan 10 '22 09:01 ChrisRackauckas

Thanks for the responses! I went looking for the tricks you mentioned in the JVP code, but I can't seem to find anything useful, could you point me in a more specific direction?

RS-Coop avatar Jan 21 '22 04:01 RS-Coop