Yuhang Li
Yuhang Li
Hi, When we first test our algorithm without weight normalization, we also find that problem. It seems that the gradients of the clipping parameter in several layers will explode suddenly....
I think weight normalization cannot be applied to the last layer because the output of the last layer is the output of the network, without BN to standardize its distribution....
Hi, Thanks for your question. We define `alpha` as the clipping threshold for the quantization. Not the step size between two adjacent quantization levels. So we expect the tensor divide...
Do you use max pooling or average pooling in VGG?
Got it, I will double check the code, and update it in this weekend.
just uploaded a new framework and new checkpoints, feel free to report any issues.
Hi, the accuracy mismatch is probably due to the different implementation of the data-loader between my training environment and the official pytorch environments. Did you verify it through direct training?
Hi, I found a typo in the dataloader, can you test it now?
Hi, our detection experiments are conducted in our internal framework. So we might not be able to provide you with the ckpt file directly. However, you can use our code...
Sounds like an error from autoaugment. A quick solution would be, to avoid using autoaugment in your CIFAR data loader, you can set it to False. But it may not...