Jeremy Bernstein comments

Results 6 comments of


                                            Jeremy Bernstein

Where is SignSGD performed ?

Hi @manishadubey91, sorry this is unclear. You have to pass in the optimiser as a command line argument. For example: `python train_resnet.py --optim signum --lr 0.0001 --wd 0.00001` This works...

> this is the implementation you're referring to right?https://github.com/apache/incubator-mxnet/blob/f70c7b7b1e246e32e322ba059f8bf0e5d01a22be/src/operator/optimizer_op-inl.h#L2303 > > seems to be using 2 bits: (-1, 0, 1) Hi @amitport, you're right and thanks for pointing this out....

Where is SignSGD performed ?

Hi @amitport, I tested the difference between the version that sends `sign(0) --> 0` and the version that sends `sign(0) --> ±1` at random. The tests and results are in...

How to calculate R1 & R2 mentioned in the paper?

Hi Jay, Sorry for the late reply. The idea in this paper was that phi measures the sparseness / denseness of a vector. When the vector is "dense" (meaning most...

net.zero_grad()

Hi Jason, it may or may not work just yet as a simple drop-in. For instance, we still don't support all layer types. That said, `optimizer.zero_grad()` should work now since...

How to support other layers?

Hey @zq-OwO-qz, thanks for your interest in this project. I've currently shifted focus and am currently actively developing this project: https://github.com/jxbz/modula. There we support more layer types including GPT. The...