GLMNet.jl icon indicating copy to clipboard operation
GLMNet.jl copied to clipboard

Update glmnet source

Open jolars opened this issue 5 years ago • 5 comments

The glmnet source in this repository is outdated, dating back to 2015. The glmnet fortran backbone has since been updated several times. Please consider updating to the latest version.

The source files can be found at https://github.com/cran/glmnet/tree/master/src

jolars avatar Jul 06 '20 11:07 jolars

I updated the binary builder repo to the latest source: https://github.com/JuliaPackaging/Yggdrasil/pull/2028

However, when I try using the new JLL version it doesn't seem to work, so help may be needed debugging that.

I looked at the diff between the source we are using and the latest copy from the glmnet repo, and the good news is that it seems like the changes are largely cosmetic, with the biggest change being the introduction of a progress meter integrated with R. I couldn't find any significant changes to the actual algorithm from a quick look through: https://gist.github.com/JackDunnNZ/b04d15fc48fb33db9cff248582c6bc46

JackDunnNZ avatar Dec 03 '20 17:12 JackDunnNZ

It seems a major difference is that glmnet 4.0 can fit any GLM family, see, e.g. https://statisticaloddsandends.wordpress.com/2020/05/14/glmnet-v4-0-generalizing-the-family-parameter/ and https://cran.r-project.org/web/packages/glmnet/vignettes/glmnetFamily.pdf.

devmotion avatar Dec 29 '20 15:12 devmotion

Sorry, my comment was in reference to changes in the underlying glmnet fortran code, which based on the diff above seems to be largely unchanged - it seems that all of the changes in the recent releases are in the R code instead, and could be ported into Julia without having to update the underlying libglmnet.

JackDunnNZ avatar Dec 30 '20 06:12 JackDunnNZ

From the blog post I got the impression that this generalization was only possible by generalizing the Fortran code as well:

Before v4.0, glmnet() could only optimize the penalized likelihood for special GLM families (e.g. ordinary least squares, logistic regression, Poisson regression). For each family, which we specified via a character string for the family parameter, we had custom FORTRAN code that ran the modified IRLS algorithm above. While this was computationally efficient, it did not allow us to fit any penalized GLM of our choosing.

From v4.0 onwards, we can do the above for any GLM family in practice. [...] Underneath the hood, instead of having custom FORTRAN code for each family, we have a FORTRAN subroutine that solves (2) efficiently.

devmotion avatar Dec 30 '20 09:12 devmotion

Well they probably know better than I do 😅

JackDunnNZ avatar Dec 30 '20 15:12 JackDunnNZ