Nithin Rao comments

Results 117 comments of


                                            Nithin Rao

Speaker Diarization with Marblenet and ClusterDiarizer issue

VAD and Speaker Embedding extractor models you used are outdated. Which NeMo version are you using? Please use [vad_telephony_marblenet](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/vad_telephony_marblenet) for VAD and [titanet_large](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/titanet_large) for Speaker Embedding extractor

Getting Key Error while finetuning Speaker Recognition model

Hi @alamnasim , we train speaker embedding extractors as speaker identification tasks. In these tasks in order to evaluate your model on the dev set, the dev set needs to...

Loss is getting zero when training Titanet with Adam

Did you change the learning rate? try with 0.001 as the learning rate and wd=0.0002.

Loss is getting zero when training Titanet with Adam

Please share your config file if possible.

Loss is getting zero when training Titanet with Adam

I would start experimenting by increasing batch_size to 128 and lowering lr to 0.0001, since you have only one GPU. Though adamw doesn't show changes with weight_decay, I would still...

Loss is getting zero when training Titanet with Adam

I was training with adam optimizer and loss seems fine to me. how many samples in your dataset?

Loss is getting zero when training Titanet with Adam

Thanks. Can you send a PR, will review the changes.

Speaker Verification

You can use this function to get batch embeddings https://github.com/NVIDIA/NeMo/blob/4cd9b3449cbfedc671348fbabbe8e3a55fbd659d/nemo/collections/asr/models/label_models.py#L420 Once you get embeddings you can compare those embeddings using cosine similarity score. For example, you can view this script...

Speaker Verification

Depends on your use case. You could try averaging cosine scores or average embeddings of each utterance per speaker ( if you have many samples per speaker). There is no...

Speaker Verification

For the above example, both are 192-dimensional vectors you can average along this dimension. You would get a 192-dimensional embedding. There is no constraint on the duration of the file,...