FTRL is identical to SGD for unconstrained problem when no noise is added

Open JohnlNguyen opened this issue 4 years ago • 1 comments

On line https://github.com/google-research/DP-FTRL/blob/main/optimizers.py#L59,

Why is ms + (-gs - nz) / alpha? This makes FTRL not FTRL is identical to SGD if learning rate is not 1.0. Shouldn't it be ms + (-gs - nz) * alpha?

Aug 18 '21 04:08 JohnlNguyen

Hi! Sorry for my really late reply. I'm not sure if I fully understand your question, but I think it's likely due to the mismatch between the parameters: the alpha is here is the inverse of the learning rate. Does that make sense?

Dec 08 '21 00:12 shs037