Sangmin Lee issues

Repositories
Issues
Comments

Results 2 issues of


                                            Sangmin Lee

Is Gradient Clippping in the code as it is on the paper?

Is Gradient clipping gr = gq1|r|≤1 still used in the code? The only part I see clipping is p.org.copy_(p.data.clamp_(-1,1)) in def train(): `optimizer.zero_grad()` `loss.backward()` `for p in list(model.parameters()):` ` if...

Are full precision weights normalized?

Are full precision weights normalized before quantization as in the original paper? The original paper says: "First, we normalize the full-precision weights to the range [-1, +1] by dividing each...