urhythmic Training data of the segmenter

Hi @bshall, thanks for the great work!

I am wondering what is the training data of the pre-trained segmenter, as it is not described in the paper or this repo.

Thanks!

Mar 12 '24 06:03 unilight

Hi @unilight, thanks for the feedback! Sorry about the delay getting back to you. The pre-trained segmenter was trained on p225. But I've tested it out on the other speakers and the learned mapping from clusters to sonorants, obstruents, and silences seems to be consistent. Let me know if you've found something different.

Mar 18 '24 12:03 bshall

@bshall Thank you for the reply! I do find that segmenter works well on other speakers. I am just wondering how you found the mapping from cluster index to sonorants, obstruents, and silences. Do I need to manually find the correspondence by looking at the phoneme labels and the cluster indices? Asking this because I want to apply the same method to some atypical speech (say, accented speech). I found that the pre-trained segmenter works well on normal speech but has some problems with such atypical speech, so I wonder whether it is possible to train my own segmenter.

Mar 19 '24 10:03 unilight