Miller Wilt
Miller Wilt
I ran into this problem when I had set `weight_decay` > 0. Once I removed it memory usage was constant.
This would be a great change, if possible. I'm currently dealing with an issue where a transitive dependency ([linear-operator](https://pypi.org/project/linear-operator/), which is based on PyTorch) uses jaxtyping but another package I...
I'm using poetry as my package manager, which for better or worse is quite the stickler about version ranges.
I'll explore it, as reading the documentation I see it supports overrides. Super handy!