Mylo

Results 125 comments of Mylo

look at it like this. if you have 3000 files, and train for 24 epochs for example, it will still be worse than 6000 files for 12 epochs. an epoch...

do the wavs used in training sound normal though?

maybe there's a Hindi HuBERT model somewhere, you could try loading it

great, as long as they're on the same rate with the same amount of features, it should work

I haven't used that quantizer because it is not compatible with bark. It uses completely different values to represent the semantic features. I trained on english because english is the...

HuBERT wav2vec outputs have 768 features, that's why i picked that number, if you want to use a different number, pass `input_size=1024` in the constructor the default input shape is...

about 50x768 features per second, or 50x1024 in your case. if it's slightly different, that's fine.

> does hubert_base_ls960.pt pretrained only with English? it seems to work with more than just english, not every single language though.

> @gitmylo , On hubert training specs its seems its trained on `librispeech_asr` dataset which is a monolingual [english only] dataset. > > Additionally its labelled only `english` . >...

The dataset creation code is up at https://github.com/gitmylo/bark-data-gen To get the semantics from a voice, you have to use a trained HuBERT quantizer model. See a problem? It cannot be...