Knowledge distillation support for Nemo ASR models
Is your feature request related to a problem? Please describe. I am currently working with Nemo FastConformer Large model, but would like to deploy a smaller model on mobile devices. Instead of training from scratch, I would like to perform KD
Describe the solution you'd like I would greatly appreciate it if the Nemo ASR team could implement Knowledge Distillation (KD) support or maybe guide me how to do it with a FastConformer Model :
- Maybe an Easier API support for KD within the Nemo framework.
- Documentation and examples demonstrating how to use KD for model compression.
- Tools or guidelines for effectively transferring knowledge from large to small models.
Describe alternatives you've considered As an alternative, I have considered manually implementing KD techniques using pytorch lightening as described in #1996. However I did not get the full picture.
Thank you for your consideration
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
I'm also interested in this topic. any update?
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.