Ben Murrell

Results 17 comments of Ben Murrell

This should be easy, convenient, and visible (so that the user is aware of the possible footgun). That rules out 5. > `Optimisers.adjust!(os, SignDecay(0.3))`, passing a whole new rule. Will...

> Obviously just the rule, surely? If you have setup Momentum, and do adjust!(... AdamW...) in any form, it can't replace the whole rule because it can't change the saved...

This was my fix: https://github.com/MurrellGroup/Jjama3.jl/blob/main/ext/MetalExt.jl

Similarly, what if I wanted to use SignDecay with AdamW, so I set AdamW's lambda to 0. Would trying to adjust the SignDecay lambda cause AdamW's lambda to then be...

How about adding an adjust method that lets you specify the type. Like `adjust!(state, (SignDecay, lambda = 0.05))` and only adjusts optimisers that match the type?

If I were to pick one number it would be `std`, but `std` is `NaN` when taken over a single value, and if you aren't aware of this you might...

FYI I've made this: https://github.com/MurrellGroup/CannotWaitForTheseOptimisers.jl which can house new optimisers until you folks are happy to have them included in Optimisers.jl (mostly because I really just need to be able...

I've now added an attempt at a "gradient norm growth limiter" because this paper used this in conjunction with Apollo. I think they apply this over the whole model, but...

The authors of the method haven't yet posted code, but they now link to this implementation on their github: https://github.com/zhuhanqing/APOLLO/tree/main ~~I think we could consider merging this? @mcabbott ?~~ Edit:...