NeMo Knowledge distillation support for Nemo ASR models

Is your feature request related to a problem? Please describe. I am currently working with Nemo FastConformer Large model, but would like to deploy a smaller model on mobile devices. Instead of training from scratch, I would like to perform KD

Describe the solution you'd like I would greatly appreciate it if the Nemo ASR team could implement Knowledge Distillation (KD) support or maybe guide me how to do it with a FastConformer Model :

Maybe an Easier API support for KD within the Nemo framework.
Documentation and examples demonstrating how to use KD for model compression.
Tools or guidelines for effectively transferring knowledge from large to small models.

Describe alternatives you've considered As an alternative, I have considered manually implementing KD techniques using pytorch lightening as described in #1996. However I did not get the full picture.

Thank you for your consideration

Jan 12 '24 04:01 nabil6391

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Feb 12 '24 01:02 github-actions[bot]

I'm also interested in this topic. any update?

Feb 18 '24 02:02 leohuang2013

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Mar 21 '24 01:03 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Mar 29 '24 01:03 github-actions[bot]