Did not work well with knowledge_distillation

Open moonlightian opened this issue 3 years ago • 1 comments

With one config deployed with knowledge_distillation algorithm for compression at /examples/torch/classification/configs/resnet34_pruning_geometric_median_kd.json，it works without obvious difference between 20epochs' training and 100 epochs'. How could I confirm that kd works after pruning the model?It keeps the same state as below:

Aug 05 '22 07:08 moonlightian

Hello @moonlightian,

I think you can understand that the knowledge distillation works by the compression loss. It also includes the knowledge distillation loss.

BTW, The resnet34_pruning_geometric_median_kd.json has the "pruning_steps": 20 parameter which means that the pruning algorithm will reach the target pruning rate in the 20th epoch and will not change after 20th epoch. Thus GFLOPS, MParams and Filters parameters of the models are same after the 20th epoch.

Aug 07 '22 11:08 alexsu52