Gradient w.r.t the negative scaling factor might be wrong?

Open magicwyzh opened this issue 8 years ago • 1 comments

In trained-ternary-quantization/utils/quantization.py, line42 The last returned value of function "get_grads" is the gradient w.r.t the negative scaling factor. I think the code might be wrong...... (not sure)

Consider a simple condition that the kernel is a 1x1 tensor, According to my understanding, during the forward pass, we have: t = ternarize(fp_kernel), where the function "ternarize" makes the fp_kernel become +1, -1, 0, fp_kernel is the full-precision kernel. For negative part of fp_kernel, t=-1, for positive part, t = +1.

Then, the final negative part of scaled ternary kernel is: y = w_n * t

In my opinion, gradient w.r.t w_n should equals "grad_y * t", where "grad_y" is the gradient w.r.t negative scaled ternary kernel and corresponds to the "b*kernel_grad" in your code (line42).

Because of "t=-1" for the negative part of the kernel, i think the gradient w.r.t w_n should be grad_y * t = grad_y * (-1) = -b*kernel_grad

This result indicates that the last return value of function "get_grads" in your code should be "(-b*kernel_grad).sum()"

Am I right?

Mar 10 '18 13:03 magicwyzh

Yeah, sounds right.
But in the original paper they calculate gradient like I do in my implementation (see page 4, equation 7). Maybe there is an error in the paper. Try writing to the authors.

Mar 11 '18 21:03 TropComplique