vad finetune on my dataset
Hi, I am trying to finetune vad and speaker diarization model on korean dataset. However, I cannot find tutorial related to vad. I've found File not found (github.com) tutorial from previous issues, but it seems that the tutorial is deleted. How can I finetune vad model with my dataset? Also, Is there any way to train vad&speaker diarization model together?
Here is the tutorial for VAD: https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Voice_Activity_Detection.ipynb
Currently we don;t support training VAD and SD model together
Here is the tutorial for VAD: https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Voice_Activity_Detection.ipynb
Currently we don;t support training VAD and SD model together
Thank you for your answer! Additionally, I have a question about training dataset. Can one example of dataset contain more than 2 speech chunks?
I didn;t understand your question well: Can one utterance contain more than 2 speech chunks?
Do you mean, if VAD is frame or segment based?
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.