VISinger icon indicating copy to clipboard operation
VISinger copied to clipboard

Samples?

Open francqz31 opened this issue 2 years ago • 5 comments

@jerryuhoo Hello how are you !

I'm wondering how the samples sound like ? are they as good as the original visinger 1 ? i would love to take a listen.

francqz31 avatar Jun 20 '23 19:06 francqz31

Hi, the audio sample in this repository is not as good as the original Visinger 1, but I strongly recommend you to give it a try on https://github.com/espnet/espnet/tree/master/egs2/opencpop/svs1. I have also implemented this one, and it has better results compared to this repository.

jerryuhoo avatar Jun 20 '23 19:06 jerryuhoo

Currently, the training time has not reached the length mentioned in the original paper, so the audio quality may be slightly lower compared to the original paper. However, the audio quality of Visinger2 on espnet is almost consistent with the original paper.

jerryuhoo avatar Jun 20 '23 19:06 jerryuhoo

@jerryuhoo Thanks so much for your answers I really appreciate it .

One more question for me if you don't mind.

So now I developed my own SVS algorithm and it is great and also a bit better than Visinger2

But since SVS needs music score (note duration sequence, and note pitch sequence) + lyrics of course but I already have a SOTA Singing to text transcriptions so no worries for that part for me ) I have hard time making a dataset especially for the English language and I have to use (m4singer, opencpop, Opensinger) which are all in the Chinese language which limits the evaluation for the listener.

Having said that do you have any idea how I can easily get the music score for English vocals or automate it without all the manual hard annotations that would take months , I have about 6 hours of English vocals almost as long as "opencpop" ? is there any tool here or there or an easy new methods ?

Thanks in advance.

francqz31 avatar Jun 20 '23 19:06 francqz31

This is also a concern of mine at the moment. The duration could possibly be addressed using MFA (Montreal Forced Aligner). MIDI could be used for pitch extraction and then aligned based on tempo. However, currently, I don't know any other tools that can quickly accomplish this.

jerryuhoo avatar Jun 20 '23 19:06 jerryuhoo

@jerryuhoo Hey, how is it going!

I wonder if you have seen these two projects https://github.com/openvpi/SOME for midi and https://github.com/openvpi/SOFA I wonder if those can help us in labeling , I haven't really read up on them well still.

I think SOME converts the singing voice to midi almost accurately but I wonder if it can be used to get the midi sequence and midi duration sequence and how , if you find this interesting then I suggest looking into them and to see what's up. I want to hear your thoughts.

Thanks in advance.

francqz31 avatar Nov 02 '23 01:11 francqz31