why pb model size of pruned model is large than original pb model size
I use DCP to prune mobilenet_v2 with prune ratio 0.5. After pruning model, I use export_chn_pruned_tflite_model to compress model from ckpt to pb. But I find that the size of pruned pb model is large than original one. Could you explain why it happens?
""" -rw-rw-r-- 1 ubuntu ubuntu 77 Mar 18 08:29 checkpoint -rw-rw-r-- 1 ubuntu ubuntu 10036240 Mar 18 08:29 model.ckpt.data-00000-of-00001 -rw-rw-r-- 1 ubuntu ubuntu 11226 Mar 18 08:29 model.ckpt.index -rw-rw-r-- 1 ubuntu ubuntu 1142696 Mar 18 08:29 model.ckpt.meta -rw-rw-r-- 1 ubuntu ubuntu 10210408 Mar 18 14:33 model_original.pb -rw-rw-r-- 1 ubuntu ubuntu 9858484 Mar 18 14:34 model_original.tflite -rw-rw-r-- 1 ubuntu ubuntu 18211151 Mar 18 14:34 model_transformed.pb -rw-rw-r-- 1 ubuntu ubuntu 17892188 Mar 18 14:34 model_transformed.tflite """
For MobileNet-v1/v2, current model conversion script does not work well due to its graph transformation method. We are working on a new graph transformation method to solve this issue, and will release it as soon as it is ready.
What's the difference of graph transformation between ResNet and MobileNet. Why can graph transformation of ResNet reduce model size, and MobileNet can not? They have the same processing. " x = tf.nn.conv2d(op.inputs[0], kernel_gthr, [1, 1, 1, 1], 'SAME', data_format=data_format) x = tf.nn.conv2d( x, kernel_shrk, strides, padding, data_format=data_format, dilations=dilations) "
Why the inference time of pruned model(model_transformed.pb) is larger than original model(original.pb)? In theory, no matter what mode(NCHW or NHWC), the inference time of pruned model should be less than the original time.
Almost all the Conv2D operations in MobileNet are 1x1 conv (k x k conv are depthwise conv and cannot be pruned). Current graph transformation method (using a extra GatherV2 operations or 1x1 Conv2D operations) cannot reduce FLOPs if the original Conv2D operation is 1x1 conv. The main reason is that the pruned channels are not actually removed in these two methods. We are working on a new approach to replace the entire Conv-BN-ReLU routine, so that the pruned channels can be actually removed.
Hi @jiaxiang-wu I got the same inference issue on Resnet20. The inference time of pruned model is larger than original model. I measure the execution time of function test_pb_model in export_pb_tflite_models.py
t0 = time.time()
for idx_iter in range(100):
net_output_data = sess.run(net_output, feed_dict={net_input: net_input_data})
t1 = time.time()
print(t1-t0,"seconds wall clock")
The size of pruned_model.pb is 730K and orig_model.pb is 1.1M. However, the latency of pruned_model is about 1.335 s, and orig_model is 1.124 s.
-rw-r--r-- 1 ktvexe ktvexe 1.1M Mar 29 12:38 best_model.ckpt.data-00000-of-00001
-rw-r--r-- 1 ktvexe ktvexe 3.6K Mar 29 12:38 best_model.ckpt.index
-rw-r--r-- 1 ktvexe ktvexe 3.0M Mar 29 12:38 best_model.ckpt.meta
-rw-rw-r-- 1 ktvexe ktvexe 730K Mar 29 13:02 pruned_model.pb
-rw-r--r-- 1 ktvexe ktvexe 2.1M Mar 29 14:21 model.ckpt.data-00000-of-00001
-rw-r--r-- 1 ktvexe ktvexe 5.9K Mar 29 14:21 model.ckpt.index
-rw-r--r-- 1 ktvexe ktvexe 719K Mar 29 14:21 model.ckpt.meta
-rw-rw-r-- 1 ktvexe ktvexe 1.1M Mar 29 14:35 orig_model.pb
INFO:tensorflow:data format: NHWC
INFO:tensorflow:input: net_input:0 / output: net_output:0
INFO:tensorflow:input's shape: (?, 32, 32, 3)
INFO:tensorflow:output's shape: (128, 10)
INFO:tensorflow:Restoring parameters from pruned_models/pruned_models/best_model.ckpt
INFO:tensorflow:transforming OP: model/resnet_model/conv2d/Conv2D
INFO:tensorflow:reducing 3 channels to 3
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_1/Conv2D
INFO:tensorflow:reducing 16 channels to 4
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_2/Conv2D
INFO:tensorflow:reducing 16 channels to 5
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_3/Conv2D
INFO:tensorflow:reducing 16 channels to 5
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_4/Conv2D
INFO:tensorflow:reducing 16 channels to 5
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_5/Conv2D
INFO:tensorflow:reducing 16 channels to 5
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_6/Conv2D
INFO:tensorflow:reducing 16 channels to 5
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_7/Conv2D
INFO:tensorflow:reducing 16 channels to 5
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_8/Conv2D
INFO:tensorflow:reducing 16 channels to 6
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_9/Conv2D
INFO:tensorflow:reducing 16 channels to 8
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_10/Conv2D
INFO:tensorflow:reducing 32 channels to 11
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_11/Conv2D
INFO:tensorflow:reducing 32 channels to 11
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_12/Conv2D
INFO:tensorflow:reducing 32 channels to 11
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_13/Conv2D
INFO:tensorflow:reducing 32 channels to 11
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_14/Conv2D
INFO:tensorflow:reducing 32 channels to 11
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_15/Conv2D
INFO:tensorflow:reducing 32 channels to 19
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_16/Conv2D
INFO:tensorflow:reducing 32 channels to 15
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_17/Conv2D
INFO:tensorflow:reducing 64 channels to 36
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_18/Conv2D
INFO:tensorflow:reducing 64 channels to 36
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_19/Conv2D
INFO:tensorflow:reducing 64 channels to 36
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_20/Conv2D
INFO:tensorflow:reducing 64 channels to 36
INFO:tensorflow:transforming OP: model/resnet_model/conv2d_21/Conv2D
INFO:tensorflow:reducing 64 channels to 64
INFO:tensorflow:Restoring parameters from pruned_models/pruned_models/best_model.ckpt
INFO:tensorflow:Froze 40 variables.
INFO:tensorflow:Converted 40 variables to const ops.
INFO:tensorflow:pruned_models/pruned_models/model.pb generated
INFO:tensorflow:input: import/net_input:0 / output: import/net_output:0
1.3358736038208008 seconds wall clock
INFO:tensorflow:data format: NHWC
INFO:tensorflow:input: net_input:0 / output: net_output:0
INFO:tensorflow:input's shape: (?, 32, 32, 3)
INFO:tensorflow:output's shape: (128, 10)
INFO:tensorflow:Restoring parameters from pruned_models/pruned_models/best_model.ckpt
INFO:tensorflow:Froze 62 variables.
INFO:tensorflow:Converted 62 variables to const ops.
INFO:tensorflow:pruned_models/pruned_models/model.pb generated
INFO:tensorflow:input: import/net_input:0 / output: import/net_output:0
1.1241075992584229 seconds wall clock
Is there any further progress? Dose the new approach to replace the entire Conv-BN-ReLU routine have the versatility?
@jiaxiang-wu Hi,jiaxiang,
Please help to let us get how about the new approach,thanks
Almost all the Conv2D operations in MobileNet are 1x1 conv (k x k conv are depthwise conv and cannot be pruned). Current graph transformation method (using a extra GatherV2 operations or 1x1 Conv2D operations) cannot reduce FLOPs if the original Conv2D operation is 1x1 conv. The main reason is that the pruned channels are not actually removed in these two methods. We are working on a new approach to replace the entire Conv-BN-ReLU routine, so that the pruned channels can be actually removed.
Almost all the Conv2D operations in MobileNet are 1x1 conv (k x k conv are depthwise conv and cannot be pruned). Current graph transformation method (using a extra GatherV2 operations or 1x1 Conv2D operations) cannot reduce FLOPs if the original Conv2D operation is 1x1 conv. The main reason is that the pruned channels are not actually removed in these two methods. We are working on a new approach to replace the entire Conv-BN-ReLU routine, so that the pruned channels can be actually removed.
Hello, I wrote the mobilenetv2-ssd, and pruned it by ratio 0.5. I using the tf.gather method to export the model(NHWC, tf.gather axis is 3). The origin model size is 24M, the pruned model size is 13M. The pruned channels should be removed, But the pruned model's inference time is larger than the origin's. You said that 'The main reason is that the pruned channels are not actually removed in these two methods'. Why the pruned channels are not removed in the "gather" method?
Hi, @jiaxiang-wu is there any further progress for new approach?