Long Mai

Results 20 comments of Long Mai

**optimization.update_freq='[x]' where x = 64/k** should belong to the pre-train step

Yup! the number should follow the wa2vec repo instruction.

If you can, try to download the model and run inference on your own local machine. . If you can't, then In your colab try: import sys sys.argv.append('/test/data') print(sys.argv) It...

the "examples.speech_recognition.w2l_decoder" is located inside the installed fairseq directory. Could you please changing the [import ](https://github.com/mailong25/self-supervised-speech-recognition/blob/87af2ae8d449ec965bd4081232e09fb6036c4670/stt.py#L334) `from examples.speech_recognition.w2l_decoder` to the actual path of installed fairseq ?

You should call the import from the inside of "self-supervised-speech-recognition" directory

Yes, you can do it, but you won't be able to leverage the pretrained model (training from scratch is computational expensive) If you want a larger model, my recommendation is...

Bạn thử tải Vietnamse pretrain model rồi inference thử Trường hợp esponse trả về là [''] có thể là model chưa học được gì (weights vẫn cần là random)

You can prepare a test set and perform a simple grid search for hyperparameter tuning

batch_duration (s) = max_tokens / 16000 For example, if max_tokens is set to 160000, the total audio duration of a batch is limited to 10 seconds.

`how many seconds can I have inside a batch? ` --> I can't give you an exact number but It should be as high as possible depending on your GPU...