VoxCeleb2训练集训练CAM++模型收敛速度与模型ERR,与仓库中所提出的结果差距甚大,与一般的说话人识别,在voxceleb2训练结果也差距甚大,是库版本或者训练机器还是其他什么之类的错误吗?
您好,在使用原来的python egs/voxceleb/sv-cam++/speakerlab/bin/train.py训练代码和原始的配置文件进行训练,,未修改代码,训练收敛速度和训练性能与一般的VoxCeleb2-dev训练模型模型和仓库中提供的性能差距甚远,请问作者是否了解这种情况或者是否解决过类似issue?
补充,部分训练日志: epoch: 1 - train Avg_loss: 7.08, train Avg_acc: 6.52, train Lr_value: 2.01e-02, train Sum_time: 83.86, train Eval_EER: 16.55, train Best_EER: 16.55 epoch: 2 - train Avg_loss: 3.14, train Avg_acc: 37.30, train Lr_value: 4.01e-02, train Sum_time: 83.75, train Eval_EER: 11.90, train Best_EER: 11.90 epoch: 3 - train Avg_loss: 1.89, train Avg_acc: 60.18, train Lr_value: 6.00e-02, train Sum_time: 84.50, train Eval_EER: 9.43, train Best_EER: 9.43 epoch: 4 - train Avg_loss: 1.84, train Avg_acc: 61.72, train Lr_value: 8.00e-02, train Sum_time: 82.43, train Eval_EER: 11.51, train Best_EER: 9.43 epoch: 5 - train Avg_loss: 2.07, train Avg_acc: 57.90, train Lr_value: 1.00e-01, train Sum_time: 81.52, train Eval_EER: 12.24, train Best_EER: 9.43 epoch: 6 - train Avg_loss: 2.43, train Avg_acc: 52.27, train Lr_value: 9.99e-02, train Sum_time: 83.01, train Eval_EER: 9.78, train Best_EER: 9.43 epoch: 7 - train Avg_loss: 1.88, train Avg_acc: 62.43, train Lr_value: 9.97e-02, train Sum_time: 83.29, train Eval_EER: 9.88, train Best_EER: 9.43 epoch: 8 - train Avg_loss: 2.07, train Avg_acc: 59.36, train Lr_value: 9.93e-02, train Sum_time: 85.80, train Eval_EER: 10.07, train Best_EER: 9.43 epoch: 9 - train Avg_loss: 2.09, train Avg_acc: 59.12, train Lr_value: 9.87e-02, train Sum_time: 86.20, train Eval_EER: 10.20, train Best_EER: 9.43 epoch: 10 - train Avg_loss: 1.93, train Avg_acc: 62.26, train Lr_value: 9.80e-02, train Sum_time: 85.16, train Eval_EER: 10.76, train Best_EER: 9.43 epoch: 11 - train Avg_loss: 1.85, train Avg_acc: 63.75, train Lr_value: 9.71e-02, train Sum_time: 85.01, train Eval_EER: 10.06, train Best_EER: 9.43 epoch: 12 - train Avg_loss: 1.93, train Avg_acc: 62.43, train Lr_value: 9.61e-02, train Sum_time: 83.00, train Eval_EER: 10.08, train Best_EER: 9.43 epoch: 13 - train Avg_loss: 2.01, train Avg_acc: 61.07, train Lr_value: 9.49e-02, train Sum_time: 83.20, train Eval_EER: 9.22, train Best_EER: 9.22 epoch: 14 - train Avg_loss: 1.73, train Avg_acc: 66.32, train Lr_value: 9.35e-02, train Sum_time: 81.92, train Eval_EER: 10.07, train Best_EER: 9.22 epoch: 15 - train Avg_loss: 1.77, train Avg_acc: 65.58, train Lr_value: 9.21e-02, train Sum_time: 81.81, train Eval_EER: 10.79, train Best_EER: 9.22 epoch: 16 - train Avg_loss: 3.17, train Avg_acc: 59.58, train Lr_value: 9.05e-02, train Sum_time: 82.07, train Eval_EER: 10.39, train Best_EER: 9.22 epoch: 17 - train Avg_loss: 4.72, train Avg_acc: 60.07, train Lr_value: 8.87e-02, train Sum_time: 81.83, train Eval_EER: 13.71, train Best_EER: 9.22 epoch: 18 - train Avg_loss: 5.58, train Avg_acc: 60.55, train Lr_value: 8.69e-02, train Sum_time: 81.15, train Eval_EER: 9.51, train Best_EER: 9.22 epoch: 19 - train Avg_loss: 5.92, train Avg_acc: 61.70, train Lr_value: 8.49e-02, train Sum_time: 80.28, train Eval_EER: 9.89, train Best_EER: 9.22 epoch: 20 - train Avg_loss: 6.22, train Avg_acc: 61.06, train Lr_value: 8.28e-02, train Sum_time: 81.37, train Eval_EER: 10.64, train Best_EER: 9.22 epoch: 21 - train Avg_loss: 6.27, train Avg_acc: 61.73, train Lr_value: 8.06e-02, train Sum_time: 80.46, train Eval_EER: 11.76, train Best_EER: 9.22 epoch: 22 - train Avg_loss: 6.22, train Avg_acc: 62.74, train Lr_value: 7.82e-02, train Sum_time: 79.60, train Eval_EER: 11.02, train Best_EER: 9.22 epoch: 23 - train Avg_loss: 6.33, train Avg_acc: 62.08, train Lr_value: 7.58e-02, train Sum_time: 79.78, train Eval_EER: 11.26, train Best_EER: 9.22 epoch: 24 - train Avg_loss: 6.27, train Avg_acc: 62.75, train Lr_value: 7.34e-02, train Sum_time: 79.61, train Eval_EER: 9.17, train Best_EER: 9.17 epoch: 25 - train Avg_loss: 6.36, train Avg_acc: 62.00, train Lr_value: 7.08e-02, train Sum_time: 79.83, train Eval_EER: 10.79, train Best_EER: 9.17 epoch: 26 - train Avg_loss: 6.31, train Avg_acc: 62.57, train Lr_value: 6.82e-02, train Sum_time: 80.81, train Eval_EER: 11.27, train Best_EER: 9.17 epoch: 27 - train Avg_loss: 6.08, train Avg_acc: 64.64, train Lr_value: 6.55e-02, train Sum_time: 80.27, train Eval_EER: 9.37, train Best_EER: 9.17 epoch: 28 - train Avg_loss: 6.13, train Avg_acc: 64.52, train Lr_value: 6.27e-02, train Sum_time: 81.15, train Eval_EER: 8.44, train Best_EER: 8.44 epoch: 29 - train Avg_loss: 5.78, train Avg_acc: 67.64, train Lr_value: 6.00e-02, train Sum_time: 79.92, train Eval_EER: 8.56, train Best_EER: 8.44 epoch: 30 - train Avg_loss: 5.96, train Avg_acc: 66.03, train Lr_value: 5.72e-02, train Sum_time: 82.05, train Eval_EER: 10.18, train Best_EER: 8.44 epoch: 31 - train Avg_loss: 5.90, train Avg_acc: 66.72, train Lr_value: 5.43e-02, train Sum_time: 84.47, train Eval_EER: 10.17, train Best_EER: 8.44 epoch: 32 - train Avg_loss: 5.91, train Avg_acc: 66.65, train Lr_value: 5.15e-02, train Sum_time: 82.84, train Eval_EER: 8.49, train Best_EER: 8.44 epoch: 33 - train Avg_loss: 5.60, train Avg_acc: 69.38, train Lr_value: 4.86e-02, train Sum_time: 82.85, train Eval_EER: 8.09, train Best_EER: 8.09 epoch: 34 - train Avg_loss: 5.70, train Avg_acc: 68.65, train Lr_value: 4.58e-02, train Sum_time: 83.87, train Eval_EER: 9.04, train Best_EER: 8.09
同时,非常感谢您和您的团队提供3D-Speaker项目的整体训练过程框架,代码非常优雅简洁,我也将它应用于我个人的说话人项目当中,但是使用该3D-Speaker训练框架进行的实验结果,与我个人原版的训练代码进行训练,差距甚大,3D-Speaker性能和收敛速度异常于一般情况,同样是在Voxceleb2训练集上,ECAPA-TDNN模型,加入musan和rirs噪声数据集。
3D-Speaker训练框架日志: epoch: 1 - train Avg_loss: 13.29, train Avg_acc: 1.14, train Lr_value: 1.00e-03, train Sum_time: 22.44, train Eval_EER: 21.50, train Best_EER: 21.50 epoch: 2 - train Avg_loss: 10.14, train Avg_acc: 19.66, train Lr_value: 9.70e-04, train Sum_time: 21.98, train Eval_EER: 14.28, train Best_EER: 14.28 epoch: 3 - train Avg_loss: 8.34, train Avg_acc: 39.41, train Lr_value: 9.41e-04, train Sum_time: 22.10, train Eval_EER: 11.91, train Best_EER: 11.91 epoch: 4 - train Avg_loss: 7.39, train Avg_acc: 50.41, train Lr_value: 9.13e-04, train Sum_time: 21.93, train Eval_EER: 10.52, train Best_EER: 10.52 epoch: 5 - train Avg_loss: 6.33, train Avg_acc: 62.21, train Lr_value: 8.85e-04, train Sum_time: 22.04, train Eval_EER: 10.07, train Best_EER: 10.07 epoch: 6 - train Avg_loss: 5.84, train Avg_acc: 67.35, train Lr_value: 8.59e-04, train Sum_time: 22.12, train Eval_EER: 9.61, train Best_EER: 9.61 epoch: 7 - train Avg_loss: 5.52, train Avg_acc: 70.78, train Lr_value: 8.33e-04, train Sum_time: 22.15, train Eval_EER: 7.96, train Best_EER: 7.96 epoch: 8 - train Avg_loss: 5.13, train Avg_acc: 74.41, train Lr_value: 8.08e-04, train Sum_time: 22.15, train Eval_EER: 9.29, train Best_EER: 7.96 epoch: 9 - train Avg_loss: 4.82, train Avg_acc: 77.07, train Lr_value: 7.84e-04, train Sum_time: 22.21, train Eval_EER: 6.69, train Best_EER: 6.69 epoch: 10 - train Avg_loss: 4.64, train Avg_acc: 78.65, train Lr_value: 7.60e-04, train Sum_time: 22.26, train Eval_EER: 5.82, train Best_EER: 5.82 epoch: 11 - train Avg_loss: 4.62, train Avg_acc: 78.89, train Lr_value: 7.37e-04, train Sum_time: 22.10, train Eval_EER: 5.58, train Best_EER: 5.58 epoch: 12 - train Avg_loss: 4.52, train Avg_acc: 79.80, train Lr_value: 7.15e-04, train Sum_time: 23.98, train Eval_EER: 6.11, train Best_EER: 5.58
个人训练框架训练日志: 2 epoch, LR 0.000970, LOSS 5.910094, ACC 14.81%, EER 3.66%, bestEER 3.66% 3 epoch, LR 0.000941, LOSS 4.883111, ACC 22.80%, EER 2.98%, bestEER 2.98% 4 epoch, LR 0.000913, LOSS 4.337683, ACC 28.45%, EER 2.78%, bestEER 2.78% 5 epoch, LR 0.000885, LOSS 3.978573, ACC 32.72%, EER 2.40%, bestEER 2.40% 6 epoch, LR 0.000859, LOSS 3.722487, ACC 35.97%, EER 2.46%, bestEER 2.40% 7 epoch, LR 0.000833, LOSS 3.525087, ACC 38.60%, EER 2.09%, bestEER 2.09% 8 epoch, LR 0.000808, LOSS 3.363293, ACC 40.90%, EER 2.32%, bestEER 2.09% 9 epoch, LR 0.000784, LOSS 3.233289, ACC 42.84%, EER 1.86%, bestEER 1.86% 10 epoch, LR 0.000760, LOSS 3.116947, ACC 44.53%, EER 2.01%, bestEER 1.86% 11 epoch, LR 0.000737, LOSS 3.020313, ACC 45.94%, EER 1.86%, bestEER 1.86% 12 epoch, LR 0.000715, LOSS 2.936982, ACC 47.32%, EER 1.77%, bestEER 1.77% 13 epoch, LR 0.000694, LOSS 2.859588, ACC 48.49%, EER 1.79%, bestEER 1.77% 14 epoch, LR 0.000673, LOSS 2.785062, ACC 49.68%, EER 1.71%, bestEER 1.71% 15 epoch, LR 0.000653, LOSS 2.728004, ACC 50.58%, EER 1.66%, bestEER 1.66%
哈喽同学,我最近也在vox上训练cam++,原配置文件下我有第一个训练轮次的结果给你参考,如下。
同时想问一下您在vox2数据集上训练cam++,1个epoch需要多长时间?我这一轮要2个小时。
epoch1 avg_acc:33.15 EER_O:6.0366
你是否尝试过https://github.com/modelscope/3D-Speaker/blob/main/egs/voxceleb/sv-cam%2B%2B/run.sh 训练脚本,这个的结果正常吗
我在训练之后发现,同一个人说话进行声纹识别相似度最佳是0.5到0.7,满足相似度=0.7的条件是说话人时长很长才会有
不过推理脚本是基于你们提供脚本改的
你是否尝试过https://github.com/modelscope/3D-Speaker/blob/main/egs/voxceleb/sv-cam%2B%2B/run.sh 训练脚本,这个的结果正常吗
非常感谢您的及时回复,我发现问题了,虽然我没试过“https://github.com/modelscope/3D-Speaker/blob/main/egs/voxceleb/sv-cam%2B%2B/run.sh 训练脚本”进行模型训练,但是我的整体训练架构都是基于和参考此脚本和仓库已有代码进行书写的,我的issue问题的主要原因是,可能是pytorch版本不同等原因,该仓库提供的各个模型yaml配置文件里的dataloader类配置里,并没有显式设置shuff=True,默认为False,而训练数据的.csv文件是有序的,导致模型在有序数据上训练而出现泛化性极差结果
哈喽同学,我最近也在vox上训练cam++,原配置文件下我有第一个训练轮次的结果给你参考,如下。
同时想问一下您在vox2数据集上训练cam++,1个epoch需要多长时间?我这一轮要2个小时。
epoch1 avg_acc:33.15 EER_O:6.0366
我找到我的bug原因了,同时,在3090显卡上训练一个epoch需要80min epoch: 1 - train Avg_loss: 7.08, train Avg_acc: 6.52, train Lr_value: 2.01e-02, train Sum_time: 83.86, train Eval_EER: 16.55, train Best_EER: 16.55
训练速度和你的卡相关,如果只有一张,80min时正常的,可以关注gpu利用率