research
research copied to clipboard
The problem of knowledge distillation
In the knowledge distillation part of this article, I ran the code and found that the acc before distillation was higher than the acc after distillation. The picture below is a random screenshot of my running results, but the acc after each round of distillation was not as good as before distillation acc, why is this? I hope to get your answer.