I have a question about parameter number.
after pruning, I count the parameter of the network resnet34. well... I found the number of parameters to be 4.2 * 10e7 I know parameter number of resnet34 is about 2 * 10e7 I think it's because of the mask of layer that the difference is about double, right? then how to remove the mask layer after prunning? Thanks you.
Sorry for the slow response.
The easiest way to do this would be to create a normal resnet34 (without masks), and then copy across the weights.
You'll want to do something like:
net = NormalResNet34()
masked_net = MaskedResNet34()
for normal, masked in zip([net.layer1, net.layer2, net.layer3, net.layer4], [masked_net.layer1, masked_net.layer2, masked_net.layer3, masked_net.layer4]):
normal.conv1.weight = masked.conv1.weight
normal.bn1.weight = masked.bn1.weight
normal.conv2.weight = masked.conv2.weight
# etc.
Note that you're probably going to have to copy some biases as well (you can just do normal.bn1.bias).
It would be cool to have this as a feature in the code. If you figure it out, please feel free to open a pull request 😊
If not I'll get to implementing it soon.
Sorry for the slow response.
The easiest way to do this would be to create a normal resnet34 (without masks), and then copy across the weights.
You'll want to do something like:
net = NormalResNet34() masked_net = MaskedResNet34() for normal, masked in zip([net.layer1, net.layer2, net.layer3, net.layer4], [masked_net.layer1, masked_net.layer2, masked_net.layer3, masked_net.layer4]): normal.conv1.weight = masked.conv1.weight normal.bn1.weight = masked.bn1.weight normal.conv2.weight = masked.conv2.weight # etc.Note that you're probably going to have to copy some biases as well (you can just do
normal.bn1.bias).It would be cool to have this as a feature in the code. If you figure it out, please feel free to open a pull request
If not I'll get to implementing it soon.
Good idea, thank you. @jack-willturner
Using mask for pruning training is a very common way in pruning algorithm. I didn't want to extract effective parameters after mask training before. Your description gives me a specific implementation path (Although using this scheme needs to reimplement a model definition that can customize the number of parameters of each layer)