Can we run NeMo MSDD Neural Diarizer model in realtime for realtime diarization?

Open ThiruRJST opened this issue 1 year ago • 3 comments

Disclaimer I just want some inputs in developing a below mentioned pipeline, if it's feasible.

Requirements I'm currently working on a project where I need the diarization to work in a real-time manner. I tried using the PyAnnote Library but None of those could match the accuracy of MSDD but We weren't able to run MSDD in a real-time manner. Is there any example for how to do this. We need to run MSDD along with Whisper model for transcription.

Jun 11 '24 17:06 ThiruRJST

check these two PRs: https://github.com/NVIDIA/NeMo/pull/5609, https://github.com/NVIDIA/NeMo/pull/7896 Although I couldn't get them to work

Jun 13 '24 15:06 MahmoudAshraf97

Thanks for your interest, no we currently don;t support online speaker diarization using MSDD.

Jun 14 '24 22:06 nithinraok

@nithinraok thank you nithin for the confirmation

Jun 21 '24 06:06 ThiruRJST