Identifying Spoken Language
Hello, developers. Is there a model or something to identify spoken language? For example, how to identify whether a speaker speaks English or Russian. I looked for it in the tutorials and found nothing. I will appreciate any help
@fayejf is the model published? Please point to the docs.
It looks like there is a labeller, see https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_classification/speech_to_label.py#L81
@jnnnnn @Sasha-Bachynskyi The model is published. Thanks for your patience. https://github.com/NVIDIA/NeMo/pull/5080
Hi, @fayejf!
I can't figure out how to use this model. There is only an instance of how to initialize a model. Could you give an example of what method I should call and how to pass the audio file in?
Thank you in advance for helping!
Hi @Sasha-Bachynskyi , PR to merge info regarding docs should be merged soon. https://github.com/NVIDIA/NeMo/pull/5366
You may infer the label using EncDecSpeakerLabelModel class. https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/api.html#nemo.collections.asr.models.EncDecSpeakerLabelModel
For inferencing on single audio file use get_label method. Instead for inferencing on multiple files use batch_inference
Hi @nithinraok, I'm sorry for bothering you. I want to identify the spoken language in a single file.
I use the following instruction
Below is my code:
import nemo.collections.asr as nemo_asr
langid_model = nemo_asr.models.EncDecSpeakerLabelModel.from_pretrained(model_name="langid_ambernet")
lang = langid_model.get_label('audio.wav')
But, I get an error:
Traceback (most recent call last):
File "/home/denis/test_lang/test-lang.py", line 5, in <module>
lang = vad_model.get_label('audio.wav')
File "/home/denis/anaconda3/envs/nemo2/lib/python3.9/site-packages/nemo/collections/asr/models/label_models.py", line 455, in get_label
_, logits = self.infer_file(path2audio_file=path2audio_file)
File "/home/denis/anaconda3/envs/nemo2/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/denis/anaconda3/envs/nemo2/lib/python3.9/site-packages/nemo/collections/asr/models/label_models.py", line 427, in infer_file
audio = librosa.core.resample(audio, sr, target_sr)
TypeError: resample() takes 1 positional argument but 3 were given
It seems that there is something wrong with librosa
System info: Nvidia video A40 Nemo - branch main, installed 22th of February 2023 librosa - 0.10.0
What can it be? I'd appreciate any help in advance
Looks like librosa is expecting mandatory naming args from newest version. Lower your librosa version or use the fix provided at https://github.com/NVIDIA/NeMo/pull/6086