Adam optimizer

Open CorySimon opened this issue 9 years ago • 1 comments

This package is really useful as learning rate updaters. I'm using a variant of the Adam scheme here for SGD.

I think it is unnecessary to have \rho_i^t as vectors. Shouldn't these be Float64's? Also, pedantic, I'm not sure why they are called \rho instead of \beta. https://github.com/JuliaML/StochasticOptimization.jl/blob/master/src/paramupdaters.jl#L123-L124

Mar 21 '17 16:03 CorySimon

Also, comparing to the paper, https://arxiv.org/pdf/1412.6980.pdf the update of \theta is not correct for the Adam optimizer. Shouldn't it be:

θ[i] -= α * m[i] / (1.0 - β₁ᵗ) * sqrt(1.0 - β₂ᵗ) / (sqrt(v[i]) + ϵ * sqrt(1.0 - β₂ᵗ))

Please confirm that I am correct, and I will make a pull request. Thanks.

Mar 21 '17 17:03 CorySimon