DeepLearningExamples icon indicating copy to clipboard operation
DeepLearningExamples copied to clipboard

[Tacotron2/TRTIS] Is it possible support non-English language like Chinese in trtis_cpp?

Open RaymondTsao opened this issue 5 years ago • 5 comments

Describe the bug

I already trained a English & Chinese bilingual tacotron model with my own data on following source: https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2

And inference output is OK, my inference phrase like following: ji2-jiang1 wei4-nin2 bo1-fang4 pau yi4-shu4 wu3-dao4 pau gu3-ba1 dang1-dai4 wu3-dao4-tuan2 pau DH-AX0 S-AE1-K-R-AO0-L D-AE1-N-S then the output file I can understand what it say

Then I need speed up inference time, I try following the steps to export my own tacotron model to onnx & tensorRT in trtis_cpp folder: https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp and could transfer my model successfully.

But inference output I can't understand what it saying, like alien language. :( I found can add possible syllable in model-config/tacotron2waveglow/mapping.txt , but I added all syllable and rebuild again, the inference output audio file still sound nonsense.

So, is it possible support non-english language like chinese in trtis_cpp? Any files could modify to do this?

RaymondTsao avatar Nov 25 '20 08:11 RaymondTsao

Hi,

Same we are facing for arabic, may we have an update on this?

Thanks, Muhammad Ajmal Siddiqui

ma-siddiqui avatar Dec 04 '20 14:12 ma-siddiqui

Hi,

Same we are facing for Vietnamese?

Thanks, Thuy Tran

guyqaz avatar May 12 '21 04:05 guyqaz

Describe the bug

I already trained a English & Chinese bilingual tacotron model with my own data on following source: https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2

And inference output is OK, my inference phrase like following: ji2-jiang1 wei4-nin2 bo1-fang4 pau yi4-shu4 wu3-dao4 pau gu3-ba1 dang1-dai4 wu3-dao4-tuan2 pau DH-AX0 S-AE1-K-R-AO0-L D-AE1-N-S then the output file I can understand what it say

Then I need speed up inference time, I try following the steps to export my own tacotron model to onnx & tensorRT in trtis_cpp folder: https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp and could transfer my model successfully.

But inference output I can't understand what it saying, like alien language. :( I found can add possible syllable in model-config/tacotron2waveglow/mapping.txt , but I added all syllable and rebuild again, the inference output audio file still sound nonsense.

So, is it possible support non-english language like chinese in trtis_cpp? Any files could modify to do this?

@RaymondTsao I would like to train an English & Chinese bilingual TTS model, how is the format of your datasets? Your part of inference phrase like wu3-dao4, does it mean 舞蹈 in Chinese? If true, how do I transform all the text into this form?

And the DH-AX0 S-AE1-K-R-AO0-L D-AE1-N-S is the phoneme of English?

R7788380 avatar Jan 22 '22 06:01 R7788380

@R7788380

hi, I have my own chinese & english parser and dictionary to do it. So you may can make mandarin text to symbols like by python module name pinyin, but output without parser information.

yeah, DH-AX0 S-AE1-K-R-AO0-L D-AE1-N-S is the phoneme of English, you can do it by CMUdict.

RaymondTsao avatar Jan 22 '22 06:01 RaymondTsao

@R7788380

hi, I have my own chinese & english parser and dictionary to do it. So you may can make mandarin text to symbols like by python module name pinyin, but without parser information.

yeah, DH-AX0 S-AE1-K-R-AO0-L D-AE1-N-S is the phoneme of English, you can do it by CMUdict.

Thank you very much for your reply! I will try it.

R7788380 avatar Jan 22 '22 06:01 R7788380