Sangmin Lee
Results
2
issues of
Sangmin Lee
Is Gradient clipping gr = gq1|r|≤1 still used in the code? The only part I see clipping is p.org.copy_(p.data.clamp_(-1,1)) in def train(): `optimizer.zero_grad()` `loss.backward()` `for p in list(model.parameters()):` ` if...
Are full precision weights normalized before quantization as in the original paper? The original paper says: "First, we normalize the full-precision weights to the range [-1, +1] by dividing each...