Chen Shangyu
Chen Shangyu
Hi @curryJ , PyTorch >= 0.4 should work.
Hi @dsfour , Thanks for using our codes. You are correct about the hessian matrix calculation. The hessian matrix should be divide by (dataset_size * num_batches). For the nan problem,...
Hi @dsfour , The slow speed comes from the huge number of patches after extracted from input data (because of sliding window in convolution, number of extracted patches is always...
Hi @caiwenpu , Thanks for using my code ! But that is not a big problem right? It will not change the functionality of the method. Learning rate can do...
Hi @caiwenpu , Sorry for the late reply. neg_indices selects the gradients that correspond to weights set as negative. Thus the gradient of negative weights is not merely the opposite...
Hi @adpatil234 , Thanks for using our code and your reports. We think it might be the problem of classes imbalance in hessian generation. Since it is a reproduction of...
Hi @adpatil234 , We compute Hessian for **once**, in paper and codes. You are right about the activation distribution change after pruning. In our paper, we assume that (Eq.4) pruned...
Hi @adpatil234 , The reason why it produces NaN is because of the Hessian. For some layers Woodbury method should be used to generate inverse Hessian otherwise it will produces...
Hi @dalistarh , can you tell me more about your issues? Like the performance after pruning and after fine-tine. Besides, you can refer to the dev branch, where I implement...
Hi, Thanks for using our code. This function is to adjust the running mean and variance of the BN layers in the pruned model. This calibration of BN parameters is...