lookahead_pruning
lookahead_pruning copied to clipboard
Several question on laprune.py
Hi, thanks for your great work and sharing of the code! I have several questions on the code: (1) In this line, why we add the batch-norm score to the total score instead of multiply, from Equation(8) in the original paper, the scaling factor in batch norm is multiplied to the distortion score.
(2) what is this scenario mean? (I know the previous two cases are conv-->FC and conv-->conv)
(3) What is the parameter "split" ?
(4) why is the function get_bn_weihts calculated this way? (I notice the activation re-scaling use a sqrt version of the original activation. )
Any help would be appreciated and thanks for your time.