MOSS-TTSD icon indicating copy to clipboard operation
MOSS-TTSD copied to clipboard

Is there any plan to open source the diarization model?

Open Remember2015 opened this issue 7 months ago • 0 comments

As blog say:

For the original audio, we first use an internal speaker diarization model to segment and annotate speech and speakers. Based on the pre-trained base model, the performance of our speaker diarization model is better than the open source speaker diarization model pyannote-speaker-diarization-3.1 and its commercial version pyannoteAI.

  1. is this diarization model mentioned here the same as whisper-d or another model in fact?
  2. the reason of choosing model trained based on whisper-v2 (whisper-d) but not whisper-v3?

Thanks for ur reply

Remember2015 avatar Jul 31 '25 02:07 Remember2015