Long Mai comments

Results 20 comments of


                                            Long Mai

finetune.py optimization.update_freq

**optimization.update_freq='[x]' where x = 64/k** should belong to the pre-train step

finetune.py optimization.update_freq

Yup! the number should follow the wa2vec repo instruction.

Using STT

If you can, try to download the model and run inference on your own local machine. . If you can't, then In your colab try: import sys sys.argv.append('/test/data') print(sys.argv) It...

the "examples.speech_recognition.w2l_decoder" is located inside the installed fairseq directory. Could you please changing the [import ](https://github.com/mailong25/self-supervised-speech-recognition/blob/87af2ae8d449ec965bd4081232e09fb6036c4670/stt.py#L334) `from examples.speech_recognition.w2l_decoder` to the actual path of installed fairseq ?

Using STT

You should call the import from the inside of "self-supervised-speech-recognition" directory

Pretraining larger models?

Yes, you can do it, but you won't be able to leverage the pretrained model (training from scratch is computational expensive) If you want a larger model, my recommendation is...

Response empty value?

Bạn thử tải Vietnamse pretrain model rồi inference thử Trường hợp esponse trả về là [''] có thể là model chưa học được gì (weights vẫn cần là random)

Tuning lm_weight, word_score and beam_size

You can prepare a test set and perform a simple grid search for hyperparameter tuning

[Question] How do I calculate max_tokens max value?

batch_duration (s) = max_tokens / 16000 For example, if max_tokens is set to 160000, the total audio duration of a batch is limited to 10 seconds.

[Question] How do I calculate max_tokens max value?

`how many seconds can I have inside a batch? ` --> I can't give you an exact number but It should be as high as possible depending on your GPU...