DaegyeomKim
DaegyeomKim
- unseen speaker prompt inference mel-spectrogram  - seen speaker prompt inference mel-spectrogram 
Thank you for your response. I will try to modify it to extract speaker characteristics in comparison with the content of the paper. If I achieve good results, I will...
Hi yiwei0730. Thank you for your advice. I'll do some testing and share with you. Thank you.
Hello p0p4k, yiwei0730, I have incorporated the prompt encoder part from the 'https://github.com/adelacvg/NS2VC' repository to extract prompt features for the text encoder. The reason I chose this model is that...
Hello, I have conducted an experiment by adding the ns2 prompt encoder to the P-Flow text encoder. This was applied to both the structure provided by p0p4k and the one...
- adding the ns2 prompt encoder to the paper's structure(59epoch, 64batch)  - adding the ns2 prompt encoder to p0p4k's structure(59epoch, 64batch) 
Is this model zero-shot TTS possible?
The Korean data I used for training is 1186 hours.
The authors even wrote that zero-shot TTS of comparable quality to VALL-E is possible with less data. 
I can't play the demo audio, but p0p4k can?