Ethan Wu
Ethan Wu
I expect it to behave exactly like a pure function ```julia julia> grad_f(x) = gradient(x -> sum(x.^3),x)[1] grad_f (generic function with 1 method) julia> Zygote.jacobian(grad_f,rand(3))[1] 3×3 Matrix{Float64}: 0.0456902 0.0 0.0...
We have more than one bug here 🥲, see #1264.
Just copy things from there ```julia julia> function f(x, bias) jac = Zygote.jacobian(x->x.^3, x)[1] return jac * x .+ bias end f (generic function with 1 method) julia> x,bias =...
This might be a better example since layers in Lux should be treated exactly like pure functions ```julia using Lux, Zygote, Random model = Dense(3,1) ps,st = Lux.setup(Random.default_rng(),model) grad_f(x) =...
I tried your method and it didn't work on Lux, how? @ToucheSir ```julia using Lux, Zygote, Random model = Dense(3,1) ps,st = Lux.setup(Random.default_rng(),model) grad_f(m,p,s) = x -> gradient(y -> sum(m(y,p,s)[1]),x)[1]...
> It's probably calling much the same code under the hood. But one works and one doesn't? Maybe I don't understand how Zygote differentiates a functor.
Reverse mode over the Hessian ```julia julia> function f1(x, ps) # [edit: renamed not to clash] hess = Zygote.hessian(x->sum(x.^3), x) return hess * x .+ ps.bias end f1 (generic function...
```julia using ForwardDiff, Zygote, BenchmarkTools f(x,W) = sum((W.^3)*x) x = randn(30); W = randn(128,30); @benchmark Zygote.hessian(W->f($x,W), $W) BenchmarkTools.Trial: 13 samples with 1 evaluation. Range (min … max): 210.955 ms …...
Great acceleration!
Related to #1070 edit: don't close it since `hessian_inverse` is also an issue there. ```julia julia> using CUDA julia> CUDA.allowscalar(false) julia> hessian(x -> sum(tanh.(x)), cu([1,2,3.4])) ERROR: Scalar indexing is disallowed....