Coconut059 comments

Repositories
Issues
Comments

Results 2 comments of


                                            Coconut059

使用speaker diarization MISS错误率很高，请问是vad模块效果不好吗？还有结合视频的DER结果效果比单音频的还要差，请问这可以微调嘛？

在MISP2022数据集上使用speaker diarization发现仅使用音频MISS约为23%，DER34%;使用音频加视频DER大约43%，请问vad模块可以微调吗？以及结合视频的clustering可以微调嘛

This vad algorithm does not work well on Chinese data sets

Hi！ Can you tell me what is the reason why the voice activity detection module is so poor?Do the effects of this module depend heavily on the data set？