pytorch-optimizer
pytorch-optimizer copied to clipboard
Implementation of Shampoo inconsistent with the paper
https://github.com/jettify/pytorch-optimizer/blob/910b414565427f0a66e20040475e7e4385e066a5/torch_optimizer/shampoo.py#L130
Shouldn't the second argument be -0.5/order? For example, with order 2, the authors raise the precondition matrices to the -1/4th power.