Jeremy Bernstein

Results 6 comments of Jeremy Bernstein

Hi @manishadubey91, sorry this is unclear. You have to pass in the optimiser as a command line argument. For example: `python train_resnet.py --optim signum --lr 0.0001 --wd 0.00001` This works...

> this is the implementation you're referring to right?https://github.com/apache/incubator-mxnet/blob/f70c7b7b1e246e32e322ba059f8bf0e5d01a22be/src/operator/optimizer_op-inl.h#L2303 > > seems to be using 2 bits: (-1, 0, 1) Hi @amitport, you're right and thanks for pointing this out....

Hi @amitport, I tested the difference between the version that sends `sign(0) --> 0` and the version that sends `sign(0) --> ±1` at random. The tests and results are in...

Hi Jay, Sorry for the late reply. The idea in this paper was that phi measures the sparseness / denseness of a vector. When the vector is "dense" (meaning most...

Hi Jason, it may or may not work just yet as a simple drop-in. For instance, we still don't support all layer types. That said, `optimizer.zero_grad()` should work now since...

Hey @zq-OwO-qz, thanks for your interest in this project. I've currently shifted focus and am currently actively developing this project: https://github.com/jxbz/modula. There we support more layer types including GPT. The...