Seeking Help with Tacotron 2 Training for Telugu Language

Open gujjulassr opened this issue 2 years ago • 0 comments

alignment_0 Hello everyone,

I hope this message finds you well. I'm currently working on training a Tacotron 2 model for the Telugu language, and I've encountered some challenges with alignment and output. I would greatly appreciate your expertise and guidance to help me address these issues.

Problem Description:

Data: I've collected high-quality Telugu speech data.
Training Setup:
- Learning rate: 0.00038095238
- Epochs: 1501
- Batch size: 8
- Weight decay: 1e-6
- Gradient clipping threshold: 1.0
- cuDNN enabled
- Log file: nvlog.json
- Annealing steps: 500, 1000, 1500
- Annealing factor: 0.1
- Load mel spectrograms from disk: True

Issues:

Alignment: I'm struggling to obtain accurate phoneme alignment during training, which is crucial for generating clear and coherent speech.
Output: The generated Telugu speech is not understandable and appears to be random sounds, even though my training data is in Telugu.

Request for Help: I would appreciate any assistance or advice regarding the following:

Fine-tuning hyperparameters for Telugu TTS.
Addressing alignment issues.
Suggestions for improving the output quality.
Any language-specific considerations for Telugu TTS.

I'm open to any insights, recommendations, or best practices that can help me improve the quality of my model's output. If you have experience with TTS in non-English languages, particularly Telugu, your expertise would be invaluable.

Thank you in advance for your time and support. I'm eager to learn and make progress on this project, and your guidance will be instrumental.

Feel free to ask for additional information or logs if needed. Your help is greatly appreciated.

Best regards, G samaram

### Tasks

Oct 19 '23 10:10 gujjulassr