Sam comments

Results 7 comments of

Sam

> 您好，刚刚看了您的代码，有个不理解的地方。在xnor-net的里面有个K，它是根据激活值算出来的，想问一下这一部分的实现在哪里呀？我只看到了对激活求sign操作，然后就直接卷积了。您好，非常感谢对这个项目感兴趣，关于您的问题，在XNOR-NET论文里面，确实是计算了激活值的伸缩因子k，不过在实际代码实现的时候，我没有计算激活值的k，只计算了权重的k，主要参考了如下两个代码，同样没有计算激活值的k： https://github.com/jiecaoyu/XNOR-Net-PyTorch https://github.com/liuzechun/Bi-Real-net 实验表明，激活值不计算k，影响不大，您可以实验验证一下。

请问xnor-net里面的K的实现在哪里？

> 谢谢回复！我在xnor-net++论文( https://bmvc2019.org/wp-content/uploads/papers/0121-paper.pdf )里面，发现了这个表述： “Note, that as the calculation of K is relatively expensive due to the fact that it is recomputed at each forward pass, it is common...

Why not using calibrated_grads directly?

Hi, @csyhhu, Thank you for your enthusiastic and quick response. I get the motivation of using the two grads: 1. calibrated_grads for meta-network updating. 2. Refined gradients for incorporating refinement....

Why not using calibrated_grads directly?

@csyhhu 你好，多谢你热心的回复，我明白你的意思，可能是我前面的表述有问题。不好意思，我用汉语解释一下： 1.第一个问题是，在第t次迭代最后，使用Adam或者SGD算法的到refine的梯度，然后更新了每一个参数。在t+1次迭代**仍然使用**上诉refine的梯度计算`self.meta_weight`，然后进行卷积。这个问题不大，关键是后面。 2.在`self.meta_weight = self.weight - \ lr * (self.calibrated_grads \ + (self.weight.grad.data - self.calibrated_grads.data).detach())` 中，我明白其目的与意义是将base model的loss的梯度传递到meta-net，让meta-net更新。如果直接使用`self.calibrated_grads`，当然关于loss的梯度很自然的回传到meta-net（当然，会存在您说梯度没有使用refine（Adam）的问题）。但是如果是您代码所示的： `self.meta_weight = self.weight - \ lr * (self.calibrated_grads \ +...

Why not using calibrated_grads directly?

@csyhhu 感谢您热心而迅速的解答。正如您所述的：`self.calibrated_grads`是当前迭代（**t**）meta-net合成的，对应于上轮（**t-1**）的`pre_quantized_weight`；同样`self.weight.grad`上一次迭代（**t-1**）meta-net产生并refine的梯度，对应于上上轮（**t-2**）`pre_quantized_weight`。所以两个梯度不是一回事。（不清楚我的理解有没有问题） “能将`self.weight.grad.data`的关于loss的梯度赋值给`self.calibrated_grads`吗？”的意思是：代码：`self.meta_weight = self.weight - \ lr * (self.calibrated_grads \ + (self.weight.grad.data - self.calibrated_grads.data).detach())` 的目的是为了将base-model的梯度传递给meta-net，前向传播中，`self.weight.grad.data`参与运算，但是反向传播过程中，关于损失的梯度（从某种程度上可以这么理解）：`(self.weight.grad.data).grad`（由于detach的使用，实际并不会有grad）传递（赋值）`(self.calibrated_grads).grad`，然后传递到meta-net。其实本质上我的疑惑，就是两个梯度并不是一回事，或者是分别两个迭代步骤的梯度。像代码： `self.meta_weight = self.weight - \ lr * (self.calibrated_grads \ + (self.weight.grad.data...

Why not using calibrated_grads directly?

@csyhhu 非常感谢您的解答。按照您的解释，前面的问题得到解决，再次感谢您今天迅速而耐心的解答！祝好！帅

Sam

reproduce the mAP in readme

请问xnor-net里面的K的实现在哪里？

请问xnor-net里面的K的实现在哪里？

Why not using calibrated_grads directly?

Why not using calibrated_grads directly?

Why not using calibrated_grads directly?

Why not using calibrated_grads directly?