fairseq [MMS TTS] - Can we change the speaker's voice (not language), without fine-tuning? Any controllable parameters, or seed?

❓ Questions and Help

Before asking:

search the issues.
search the docs.

What is your question?

I am using the MMS TTS and its amazing. So far for one language (eng) there is one speakers voice. Are there any parameters or random seeds which can be changed to have an entire different persons voice, without fine-tuning? Even if we cant do emotions or lets say voice pitch etc. but can it be done where we just have a random new naturally sounding person?

Code

What have you tried?

MMS TTS and Hugginface mms-tts

What's your environment?

fairseq Version (e.g., 1.0 or main): main
PyTorch Version (e.g., 1.0) - 1.13
OS (e.g., Linux): Linus
How you installed fairseq (pip, source): pip
Build command you used (if compiling from source):
Python version: 3.10

Jun 11 '23 16:06 QaisarRajput

JFYI, For now sampling rate is the only thing which can tune this a little, Higher gives you deeper voice (slower) while lower number give thinner voice (faster).

Jun 11 '23 17:06 QaisarRajput

@QaisarRajput For now, controllable generation (e.g., change gender, emotion, etc) is not supported yet. You could consider cascading the MMS TTS model with an off-the-shelf voice cloning model to achieve this.

Jun 12 '23 03:06 chevalierNoir

@QaisarRajput For now, controllable generation (e.g., change gender, emotion, etc) is not supported yet. You could consider cascading the MMS TTS model with an off-the-shelf voice cloning model to achieve this.

Could you please name one voice cloning repo on vits to achieve this? I find out that directly fine-tuning on Korean model makes very bad results.

Jun 12 '23 12:06 CopyNinja1999

Not sure how this would work, but here is one example for voice conversion.

Jun 12 '23 14:06 chevalierNoir

I suggest looking into Coqui which has recipes for using MMS-TTS (FairSeq) alongside voice cloning; I've used it successfully for gender.

Regarding emotion, etc. Bark looks promising, but I haven't tested it yet.

Mar 22 '24 19:03 khof312