MOSS-TTSD
MOSS-TTSD copied to clipboard
Is there any plan to open source the diarization model?
As blog say:
For the original audio, we first use an internal speaker diarization model to segment and annotate speech and speakers. Based on the pre-trained base model, the performance of our speaker diarization model is better than the open source speaker diarization model pyannote-speaker-diarization-3.1 and its commercial version pyannoteAI.
- is this diarization model mentioned here the same as whisper-d or another model in fact?
- the reason of choosing model trained based on whisper-v2 (whisper-d) but not whisper-v3?
Thanks for ur reply