huangx06

Results 4 comments of huangx06

我也觉得字建模准确率高不了啊。太多同音字了,模型想捕捉住正确的,不得有很强的上下文建模能力才行?

I think you can use ASR to convert your audios to text.

I don't know the exact problem of you. The training data of tacotron model is the symbol-audio pairs. You said you have audios without labeled texts. So I suggest that...

Yes. ASR refers to Automatic Speech Recognition but I don't think glue ASR and TTS model together would be something convenient.