deJQK
deJQK
I encounter the same problem when using some custom module with some parameter say `self.paramA` and the forward function including `input = torch.where(cond, self.paramA, input)`, and I definitely included `model.train()`...
Hi @youjinChung, thanks for your interest in our work. You could try to check [this function](https://github.com/snap-research/CAT/blob/1dbd048cc91e3cc2c59d4e4f0434e79ac260e7ed/distillers/base_inception_distiller.py#L342-L353). I am not sure how did you specify the `restore_student_G_path`, which is the student...
Thanks a lot. Sorry I am not very familiar with git or md so might not be able to help. Sorry for this.
Thanks for your interests in our work. Could you please try to use [3, 4, 5] to see if there is still this issue? Also, what is the performance of...
Hi @Ahmad-Jarrar , sorry for this, the quantization scheme proposed in the paper does not converge for low bits, and some modification is necessary. I remembered I posted this... For...
Hi @Ahmad-Jarrar , I have updated the [readme](https://github.com/deJQK/AdaBits#centered-weight-quantization-for-low-precision). Hope it is clear. Thanks again for your interest in our work.
> If I'm not wrong, the code given does not apply the outermost 2x-1. https://github.com/deJQK/AdaBits/blob/master/models/quant_ops.py#L142-L143
Hi @haiduo , you could check these papers: https://arxiv.org/pdf/1502.01852.pdf, https://arxiv.org/pdf/1606.05340.pdf, https://arxiv.org/pdf/1611.01232.pdf, all of which analyze training dynamics for centered weight. I am not sure how to analyze weights with nonzero...
Hi @haiduo, thanks again for your interest. For b=4, it maps [-1, 1] to [0, 1], to {0, 1, ..., 15}, to {0.5, 1.5, ..., 15.5}, to {1/32, 3/32, ...,...
@haiduo, yes for both.