Hui Wang
Hui Wang
谢谢回复,非常棒的工作! Active speaker detection是当前连接speaker-related audio cues和visual cues的重要渠道之一。我能否将训练代码merge到[3D-Speaker](https://github.com/alibaba-damo-academy/3D-Speaker)中。
There are several limitations to speaker recognition currently in the pipeline. It may not perform well when the audio duration is too short (less than 60 seconds) or when the...
Before https://github.com/modelscope/3D-Speaker/blob/ab22112d280fed17839094da3874813c9eb63460/speakerlab/models/campplus/layers.py#L109, seg.shape[-1] is more than or equal to x.shape[-1]. Therefore, https://github.com/modelscope/3D-Speaker/blob/ab22112d280fed17839094da3874813c9eb63460/speakerlab/models/campplus/layers.py#L109 make seg and x the same shape.
@Juelianqvq there is no chance for seg.shape[-1] < x.shape[-1] based on the code https://github.com/modelscope/3D-Speaker/blob/ab22112d280fed17839094da3874813c9eb63460/speakerlab/models/campplus/layers.py#L108
Thanks for spotting this bottleneck and proposing an optimization! We’ll verify the speedup and ensure numerical consistency with the original implementation.
是指large margin fine tuning,在speaker verification训练中一种比较常用提点方法
VoxCeleb2训练集训练CAM++模型收敛速度与模型ERR,与仓库中所提出的结果差距甚大,与一般的说话人识别,在voxceleb2训练结果也差距甚大,是库版本或者训练机器还是其他什么之类的错误吗?
你是否尝试过https://github.com/modelscope/3D-Speaker/blob/main/egs/voxceleb/sv-cam%2B%2B/run.sh 训练脚本,这个的结果正常吗
VoxCeleb2训练集训练CAM++模型收敛速度与模型ERR,与仓库中所提出的结果差距甚大,与一般的说话人识别,在voxceleb2训练结果也差距甚大,是库版本或者训练机器还是其他什么之类的错误吗?
训练速度和你的卡相关,如果只有一张,80min时正常的,可以关注gpu利用率