Distilling-Object-Detectors
Distilling-Object-Detectors copied to clipboard
warm step
Can you please explain the intuition for using warm_step=200 for only 1 epoch? It doesn't seem like enough for meaningful training without distillation. What happens if I use the distillation loss from scratch?
can you rephrase your question?
The warm step is not mentioned in the paper. Does it improve the result?
no, warm up is not related to distillation, it is used for stable training