[Fast Pitch 1.1] Road Map and Augmentations?
@alancucki I am a huge fan of the work here, it's honestly the most stable and fast TTS system I have used.
Is your feature request related to a problem? Please describe. Is there a way to provide emotive synthesis? where we can capture sadness and happiness and other emotions? I believe this can be done with a style token, a GST, similar to mellotron and GST-Tacotron2 If so what would be the challenges of doing so?
Thanks!
IMHO to do proper emotive synthesis there has to be an emotive dataset. In FastPitch, conditioning on speaker, which is implemented for multi-speaker models, could be overloaded to handle different variants (like sub-speakers). The problem then shifts to data labeling, or unsupervised training.
I just realized this the other day thanks!
@alancucki one more question, is there a way to blend the two speakers in the multispeaker setup with FP 1.1? Thanks!