Coconut059

Results 2 comments of Coconut059

在MISP2022数据集上使用speaker diarization发现仅使用音频MISS约为23%,DER34%;使用音频加视频DER大约43%,请问vad模块可以微调吗?以及结合视频的clustering可以微调嘛

Hi! Can you tell me what is the reason why the voice activity detection module is so poor?Do the effects of this module depend heavily on the data set?