Sound cloning

Open darkman111a opened this issue 2 years ago • 0 comments

I'm looking to build on your research. I understand this isn't the scope of your project. Just curious for an interesting. Just wanted the thoughts of the creators. I want to retrain and repuprose and expertiment with this for expressive TTS instead of generic . I'm somewhat new to working with these models.

OBJECTIVES ->

retrain on a more dynamic dataset
synthetic dataset -> speech/text[w/special utterences]{real speech/lofi speech from 'BARK'}, speech w/synthetic audio envirnoments generated by 'tango'/text[I have a rather large dataset]

EXPECTATATIOS ->

most expressive hybrid TTS[TTS with semantic conditioned background environments]

QUESTIONS ->

what are your thought on approaching voice cloning with this style of architecture? I figure I should approach like inpainting?
If possible, wouldn't it clone any artifact contained in the speech audio?

CLOSING THOUGHTS -> I'm opening to sharing my results with you guys privately. Appreciate your contribution to the community.

May 03 '23 15:05 darkman111a