NeMo vad finetune on my dataset

Hi, I am trying to finetune vad and speaker diarization model on korean dataset. However, I cannot find tutorial related to vad. I've found File not found (github.com) tutorial from previous issues, but it seems that the tutorial is deleted. How can I finetune vad model with my dataset? Also, Is there any way to train vad&speaker diarization model together?

Apr 09 '24 09:04 freshpearYoon

Here is the tutorial for VAD: https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Voice_Activity_Detection.ipynb

Currently we don;t support training VAD and SD model together

Apr 10 '24 16:04 nithinraok

Here is the tutorial for VAD: https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Voice_Activity_Detection.ipynb

Currently we don;t support training VAD and SD model together

Thank you for your answer! Additionally, I have a question about training dataset. Can one example of dataset contain more than 2 speech chunks?

Apr 11 '24 04:04 freshpearYoon

I didn;t understand your question well: Can one utterance contain more than 2 speech chunks? Do you mean, if VAD is frame or segment based?

Apr 12 '24 15:04 nithinraok

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

May 13 '24 01:05 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

May 21 '24 01:05 github-actions[bot]