zhouyong64 issues

Results 8 issues of


                                            zhouyong64

CER calculation obviously wrong

The CER calculation seems wrong. For example, with target string 'a' and prediction string 'b', I think CER should be 100%. Instead, current code outputs 0. func = SequenceError() local...

pretrained Mandarin acoustic model was trained by which dataset

The pretrained Mandarin acoustic model performs pretty good alignment. I'm wondering which dataset is used for its training? I tried training a new acoustic model with a Mandarin dataset of...

How can NSVB generalize to unseen singers?

NSVB is trained on PopBuTFy with 34 speakers. Even with the 30-hour internal singing data as described in the paper in the training of Stage1 , I doubt that this...

provide API for returning output from intermediate layers

It would be very helpful to have an API for returning output from intermediate layers, for example, the one before the final layers. This output can be used in other...

Where is the pretrained model for vocoder?

Seems like the vocoder pretrained model is missing.

What's the smallest dataset size that can train a working model with audible speech

I trained with LibriTTS 100 and LibriTTS 360 for 1.5 days and the model still can't output audible speech. I'm not sure if it's because of the small dataset size...

sampling rate of audio when using Encodec

For using Encodec, current code only supports 24KHz audios. So when training CoarseTransformer and FineTransformer, the input wave data also need to be 24KHz?

Which is better, training all modules in one stage or training stage 1 and stage 2 separately?

There is a 'train-stage' option in trainer.py In egs/libritts, there is two training precedures with different 'train-stage' options. Which is better in terms of synthesis results?