Knowledge-Distillation icon indicating copy to clipboard operation
Knowledge-Distillation copied to clipboard

KL Divergence Loss

Open syomantak opened this issue 5 years ago • 0 comments

The official implementation uses KL-Div loss while your implementation seems to use categorical cross entropy loss of Keras. Using the latter would completely invalidate the use of soft predictions in my opinion. Let me know what you think or if I am mistaken.

syomantak avatar Jul 12 '20 07:07 syomantak