[wenet] nn context biasing
The Deep biasing method comes from: https://arxiv.org/abs/2305.12493
The pre-trained ASR model is fine-tuned to achieve biasing. During the training process, the original parameters of the ASR model are frozen, and only the parameters related to deep biasing are trained. use_dynamic_chunk cannot be enabled during fine-tuning (the biasing effect will decrease), but the biasing effects of streaming and non-streaming inference are basically the same.
RESULT: Model link: https://huggingface.co/kxhuang/Wenet_Librispeech_deep_biasing/tree/main (I used the BLSTM forward state incorrectly when training this model, so to test this model you need to change the -2 to 0 in the forward function of the BLSTM class in wenet/transformer/context_module.py)
Using the Wenet Librispeech pre-trained AED model, after fine-tuning for 30 epochs, the final model was obtained with an average of 3 epochs. The following are the test results of the Librispeech test other. The context list for the test set is sourced from: https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias
Non-streaming inference:
| Method | List size | Graph score | Biasing score | WER | U-WER | B-WER |
|---|---|---|---|---|---|---|
| baseline | / | / | / | 8.77 | 5.58 | 36.84 |
| context graph | 3838 | 3.0 | / | 7.75 | 5.83 | 24.62 |
| deep biasing | 3838 | / | 1.5 | 7.93 | 5.92 | 25.64 |
| context graph + deep biasing |
3838 | 2.0 | 1.0 | 7.66 | 6.08 | 21.48 |
| context graph | 100 | 3.0 | / | 7.32 | 5.45 | 23.70 |
| deep biasing | 100 | / | 2.0 | 7.08 | 5.33 | 22.41 |
| context graph + deep biasing |
100 | 2.5 | 1.5 | 6.55 | 5.33 | 17.27 |
Streaming inference (chunk 16):
| Method | List size | Graph score | Biasing score | WER | U-WER | B-WER |
|---|---|---|---|---|---|---|
| baseline | / | / | / | 10.47 | 7.07 | 40.30 |
| context graph | 100 | 3.0 | / | 9.06 | 6.99 | 27.21 |
| deep biasing | 100 | / | 2.0 | 8.86 | 6.87 | 26.28 |
| context graph + deep biasing |
100 | 2.5 | 1.5 | 8.17 | 6.85 | 19.72 |
可以提供一些模型训练时候的conf.yaml参数设置吗?谢谢
可以提供一些模型训练时的conf.yaml参数设置吗?谢谢
上面的模型链接中有我用的yaml文件,可以直接下载
我想请问下,我在aishell170小时上训练了deep biasing的模型,但是在解码的时候如果设置deep biasing,会出现很多的漏字现象,这个会是什么原因呀?
我想请问下,我在aishell170小时上训练了deep biasing的模型,但是在解码的时候如果设置deep biasing,会出现很多的漏字现象,这个会是什么原因呀?
漏字的现象很严重吗,使用的热词列表大小多大?我这边也有做过aishell1的实验,结果比较正常,没有观察到漏字的现象
我想请问下,我在aishell170小时上训练了deep biasing的模型,但是在解码的时候如果设置deep biasing,会出现很多的漏字现象,这个会是什么原因呀?
漏字的现象很严重吗,使用的热词列表大小多大?我这边也有做过aishell1的实验,结果比较正常,没有观察到漏字的现象
很严重,就是一段一段的漏,原始设置的热词表大小是187,modelscope上开源的热词测试集,然后是设置了context_filtering参数进行过滤,如果过滤后热词表只有【0】的话,基本上就整句话漏了,如果是有热词的情况,也会出现成片漏掉的情况,设置的deep_score=1,filter_threshold=-4。目前训练迭代了17个epoch,loss_bias在10左右
我想请问下,我在aishell170小时上训练了deep biasing的模型,但是在解码的时候如果设置deep biasing,会出现很多的漏字现象,这个会是什么原因呀?
漏字的现象很严重吗,使用的热词列表大小多大?我这边也有做过aishell1的实验,结果比较正常,没有观察到漏字的现象
很严重,就是一段一段的漏,原始设置的热词表大小是187,modelscope上开源的热词测试集,然后是设置了context_filtering参数进行过滤,如果过滤后热词表只有【0】的话,基本上就整句话漏了,如果是有热词的情况,也会出现成片漏掉的情况,设置的deep_score=1,filter_threshold=-4
那确实很奇怪,总体loss的情况正常吗,正常情况下收敛到差不多的时候,bias loss应该是和ctc loss差不多,总体的loss应该会比没有训练热词模块之前更低一些,在aishell上大概是3.4左右。你用的热词相关的yaml配置是否都和我上面给出的一致
我想请问下,我在aishell170小时上训练了deep biasing的模型,但是在解码的时候如果设置deep biasing,会出现很多的漏字现象,这个会是什么原因呀?
还有就是我在做aishell1实验的时候发现对于aishell1这种句子大部分都很短的数据集,热词采样的代码需要去掉那个判断采样热词不能交叉的逻辑,不然很容易一句话只能采样出一个热词,这样训出来热词增强的效果会差一些,不过这个问题并不会导致漏字的情况。
我想请问下,我在aishell170小时上训练了deep biasing的模型,但是在解码的时候如果设置deep biasing,会出现很多的漏字现象,这个会是什么原因呀?
漏字的现象很严重吗,使用的热词列表大小多大?我这边也有做过aishell1的实验,结果比较正常,没有观察到漏字的现象
很严重,就是一段一段的漏,原始设置的热词表大小是187,modelscope上开源的热词测试集,然后是设置了context_filtering参数进行过滤,如果过滤后热词表只有【0】的话,基本上就整句话漏了,如果是有热词的情况,也会出现成片漏掉的情况,设置的deep_score=1,filter_threshold=-4
那确实很奇怪,总体loss的情况正常吗,正常情况下收敛到差不多的时候,bias loss应该是和ctc loss差不多,总体的loss应该会比没有训练热词模块之前更低一些,在aishell上大概是3.4左右。你用的热词相关的yaml配置是否都和我上面给出的一致
目前训练出来整体的loss还算是正常,从3.1下降到了2.5,bias loss会比ctc loss高一些。我现在的热词配置就是您给的这个哈
我想请问下,我在aishell170小时上训练了deep biasing的模型,但是在解码的时候如果设置deep biasing,会出现很多的漏字现象,这个会是什么原因呀?
漏字的现象很严重吗,使用的热词列表大小多大?我这边也有做过aishell1的实验,结果比较正常,没有观察到漏字的现象
很严重,就是一段一段的漏,原始设置的热词表大小是187,modelscope上开源的热词测试集,然后是设置了context_filtering参数进行过滤,如果过滤后热词表只有【0】的话,基本上就整句话漏了,如果是有热词的情况,也会出现成片漏掉的情况,设置的deep_score=1,filter_threshold=-4
那确实很奇怪,总体loss的情况正常吗,正常情况下收敛到差不多的时候,bias loss应该是和ctc loss差不多,总体的loss应该会比没有训练热词模块之前更低一些,在aishell上大概是3.4左右。你用的热词相关的yaml配置是否都和我上面给出的一致
目前训练出来整体的loss还算是正常,从3.1下降到了2.5,bias loss会比ctc loss高一些。我现在的热词配置就是您给的这个哈
会不会是你修改的热词采样部分的代码有点问题,我这边确实没遇到过你描述的状况,也想不出是什么原因,漏字而且还和传入的热词数量有关,理论上来说热词列表只剩个0应该对于正常解码的影响是最小的
您好,我尝试复现您在librispeech的结果,但是在训练热词增强模型时,出现cv loss值不下降的情况(保持在160多),并且train loss也是下降到四五十就不太下降了。 另外,我发现每次训练几个batch时,都会花五六分钟去训练下一个batch,正常情况我的显卡每训练一个batch的时间是30s左右,下面是一小段训练日志。。。
我没修改任何代码,训练conf文件也是您提供那个train_bias, 能大概分析下出现问题的原因吗? 谢谢!
2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 77.121582 loss_att 68.588936 loss_ctc 90.912209 loss_bias 61.188702 lr 0.00001204 rank 3 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 60.563221 loss_att 53.186646 loss_ctc 73.329880 loss_bias 44.453613 lr 0.00001204 rank 7 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 66.905380 loss_att 60.219139 loss_ctc 76.915077 loss_bias 55.915321 lr 0.00001204 rank 1 2023-10-17 18:48:14,599 DEBUG TRAIN Batch 0/300 loss 58.367058 loss_att 54.565548 loss_ctc 63.268948 loss_bias 39.683086 lr 0.00001204 rank 0 2023-10-17 18:48:54,507 DEBUG TRAIN Batch 0/400 loss 69.295921 loss_att 62.990799 loss_ctc 78.056396 loss_bias 59.514668 lr 0.00001604 rank 7 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 60.892227 loss_att 55.707409 loss_ctc 68.627617 loss_bias 43.625130 lr 0.00001604 rank 6 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 70.570961 loss_att 63.955940 loss_ctc 81.632156 loss_bias 43.738525 lr 0.00001604 rank 2 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 56.387531 loss_att 51.965221 loss_ctc 61.897652 loss_bias 48.085854 lr 0.00001604 rank 5 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 57.394482 loss_att 53.534023 loss_ctc 62.557728 loss_bias 38.444881 lr 0.00001604 rank 1 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 61.427593 loss_att 57.190033 loss_ctc 66.434952 loss_bias 48.802876 lr 0.00001604 rank 4 2023-10-17 18:48:54,513 DEBUG TRAIN Batch 0/400 loss 66.382660 loss_att 61.784157 loss_ctc 71.916908 loss_bias 51.955982 lr 0.00001604 rank 3 2023-10-17 18:48:54,517 DEBUG TRAIN Batch 0/400 loss 69.309433 loss_att 61.884018 loss_ctc 81.042137 loss_bias 55.932556 lr 0.00001604 rank 0 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 60.114948 loss_att 58.303940 loss_ctc 60.731007 loss_bias 36.096294 lr 0.00002004 rank 7 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 56.977654 loss_att 53.650196 loss_ctc 61.347378 loss_bias 33.943447 lr 0.00002004 rank 1 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 56.869381 loss_att 54.544899 loss_ctc 58.243603 loss_bias 40.495705 lr 0.00002004 rank 2 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 58.940693 loss_att 57.577057 loss_ctc 57.989662 loss_bias 41.328430 lr 0.00002004 rank 4 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 63.078079 loss_att 60.879333 loss_ctc 64.494652 loss_bias 37.138424 lr 0.00002004 rank 3 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 62.410076 loss_att 58.739368 loss_ctc 67.138695 loss_bias 38.363663 lr 0.00002004 rank 6 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 61.162239 loss_att 57.996552 loss_ctc 63.624905 loss_bias 49.239365 lr 0.00002004 rank 5 2023-10-17 18:55:16,909 DEBUG TRAIN Batch 0/500 loss 62.478779 loss_att 60.307823 loss_ctc 63.295692 loss_bias 42.486469 lr 0.00002004 rank 0 2023-10-17 18:55:57,183 DEBUG TRAIN Batch 0/600 loss 62.084000 loss_att 62.485199 loss_ctc 56.836884 loss_bias 43.109840 lr 0.00002404 rank 7 2023-10-17 18:55:57,186 DEBUG TRAIN Batch 0/600 loss 63.226624 loss_att 62.583645 loss_ctc 60.321804 loss_bias 44.050949 lr 0.00002404 rank 3
您好,我尝试复现您在librispeech的结果,但是在训练热词增强模型时,出现cv loss值不下降的情况(保持在160多),并且train loss也是下降到四五十就不太下降了。 另外,我发现每次训练几个batch时,都会花五六分钟去训练下一个batch,正常情况我的显卡每训练一个batch的时间是30s左右,下面是一小段训练日志。。。
我没修改任何代码,训练conf文件也是您提供那个train_bias, 能大概分析下出现问题的原因吗? 谢谢!
2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 77.121582 loss_att 68.588936 loss_ctc 90.912209 loss_bias 61.188702 lr 0.00001204 rank 3 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 60.563221 loss_att 53.186646 loss_ctc 73.329880 loss_bias 44.453613 lr 0.00001204 rank 7 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 66.905380 loss_att 60.219139 loss_ctc 76.915077 loss_bias 55.915321 lr 0.00001204 rank 1 2023-10-17 18:48:14,599 DEBUG TRAIN Batch 0/300 loss 58.367058 loss_att 54.565548 loss_ctc 63.268948 loss_bias 39.683086 lr 0.00001204 rank 0 2023-10-17 18:48:54,507 DEBUG TRAIN Batch 0/400 loss 69.295921 loss_att 62.990799 loss_ctc 78.056396 loss_bias 59.514668 lr 0.00001604 rank 7 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 60.892227 loss_att 55.707409 loss_ctc 68.627617 loss_bias 43.625130 lr 0.00001604 rank 6 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 70.570961 loss_att 63.955940 loss_ctc 81.632156 loss_bias 43.738525 lr 0.00001604 rank 2 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 56.387531 loss_att 51.965221 loss_ctc 61.897652 loss_bias 48.085854 lr 0.00001604 rank 5 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 57.394482 loss_att 53.534023 loss_ctc 62.557728 loss_bias 38.444881 lr 0.00001604 rank 1 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 61.427593 loss_att 57.190033 loss_ctc 66.434952 loss_bias 48.802876 lr 0.00001604 rank 4 2023-10-17 18:48:54,513 DEBUG TRAIN Batch 0/400 loss 66.382660 loss_att 61.784157 loss_ctc 71.916908 loss_bias 51.955982 lr 0.00001604 rank 3 2023-10-17 18:48:54,517 DEBUG TRAIN Batch 0/400 loss 69.309433 loss_att 61.884018 loss_ctc 81.042137 loss_bias 55.932556 lr 0.00001604 rank 0 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 60.114948 loss_att 58.303940 loss_ctc 60.731007 loss_bias 36.096294 lr 0.00002004 rank 7 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 56.977654 loss_att 53.650196 loss_ctc 61.347378 loss_bias 33.943447 lr 0.00002004 rank 1 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 56.869381 loss_att 54.544899 loss_ctc 58.243603 loss_bias 40.495705 lr 0.00002004 rank 2 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 58.940693 loss_att 57.577057 loss_ctc 57.989662 loss_bias 41.328430 lr 0.00002004 rank 4 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 63.078079 loss_att 60.879333 loss_ctc 64.494652 loss_bias 37.138424 lr 0.00002004 rank 3 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 62.410076 loss_att 58.739368 loss_ctc 67.138695 loss_bias 38.363663 lr 0.00002004 rank 6 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 61.162239 loss_att 57.996552 loss_ctc 63.624905 loss_bias 49.239365 lr 0.00002004 rank 5 2023-10-17 18:55:16,909 DEBUG TRAIN Batch 0/500 loss 62.478779 loss_att 60.307823 loss_ctc 63.295692 loss_bias 42.486469 lr 0.00002004 rank 0 2023-10-17 18:55:57,183 DEBUG TRAIN Batch 0/600 loss 62.084000 loss_att 62.485199 loss_ctc 56.836884 loss_bias 43.109840 lr 0.00002404 rank 7 2023-10-17 18:55:57,186 DEBUG TRAIN Batch 0/600 loss 63.226624 loss_att 62.583645 loss_ctc 60.321804 loss_bias 44.050949 lr 0.00002404 rank 3
你是不是直接从头开始训练了,为了减少对原本asr性能的影响,我写的是从一个预训练好的asr模型开始训,除了热词模块之外的参数都给冻结了。从头开始训应该也能够收敛,但是至少得把冻结的参数先解冻。
您好,我尝试复现您在librispeech的结果,但是在训练热词增强模型时,出现cv loss值不下降的情况(保持在160多),并且train loss也是下降到四五十就不太下降了。 另外,我发现每次训练几个batch时,都会花五六分钟去训练下一个batch,正常情况我的显卡每训练一个batch的时间是30s左右,下面是一小段训练日志。。。 我没修改任何代码,训练conf文件也是您提供那个train_bias, 能大概分析下出现问题的原因吗? 谢谢! 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 77.121582 loss_att 68.588936 loss_ctc 90.912209 loss_bias 61.188702 lr 0.00001204 rank 3 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 60.563221 loss_att 53.186646 loss_ctc 73.329880 loss_bias 44.453613 lr 0.00001204 rank 7 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 66.905380 loss_att 60.219139 loss_ctc 76.915077 loss_bias 55.915321 lr 0.00001204 rank 1 2023-10-17 18:48:14,599 DEBUG TRAIN Batch 0/300 loss 58.367058 loss_att 54.565548 loss_ctc 63.268948 loss_bias 39.683086 lr 0.00001204 rank 0 2023-10-17 18:48:54,507 DEBUG TRAIN Batch 0/400 loss 69.295921 loss_att 62.990799 loss_ctc 78.056396 loss_bias 59.514668 lr 0.00001604 rank 7 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 60.892227 loss_att 55.707409 loss_ctc 68.627617 loss_bias 43.625130 lr 0.00001604 rank 6 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 70.570961 loss_att 63.955940 loss_ctc 81.632156 loss_bias 43.738525 lr 0.00001604 rank 2 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 56.387531 loss_att 51.965221 loss_ctc 61.897652 loss_bias 48.085854 lr 0.00001604 rank 5 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 57.394482 loss_att 53.534023 loss_ctc 62.557728 loss_bias 38.444881 lr 0.00001604 rank 1 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 61.427593 loss_att 57.190033 loss_ctc 66.434952 loss_bias 48.802876 lr 0.00001604 rank 4 2023-10-17 18:48:54,513 DEBUG TRAIN Batch 0/400 loss 66.382660 loss_att 61.784157 loss_ctc 71.916908 loss_bias 51.955982 lr 0.00001604 rank 3 2023-10-17 18:48:54,517 DEBUG TRAIN Batch 0/400 loss 69.309433 loss_att 61.884018 loss_ctc 81.042137 loss_bias 55.932556 lr 0.00001604 rank 0 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 60.114948 loss_att 58.303940 loss_ctc 60.731007 loss_bias 36.096294 lr 0.00002004 rank 7 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 56.977654 loss_att 53.650196 loss_ctc 61.347378 loss_bias 33.943447 lr 0.00002004 rank 1 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 56.869381 loss_att 54.544899 loss_ctc 58.243603 loss_bias 40.495705 lr 0.00002004 rank 2 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 58.940693 loss_att 57.577057 loss_ctc 57.989662 loss_bias 41.328430 lr 0.00002004 rank 4 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 63.078079 loss_att 60.879333 loss_ctc 64.494652 loss_bias 37.138424 lr 0.00002004 rank 3 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 62.410076 loss_att 58.739368 loss_ctc 67.138695 loss_bias 38.363663 lr 0.00002004 rank 6 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 61.162239 loss_att 57.996552 loss_ctc 63.624905 loss_bias 49.239365 lr 0.00002004 rank 5 2023-10-17 18:55:16,909 DEBUG TRAIN Batch 0/500 loss 62.478779 loss_att 60.307823 loss_ctc 63.295692 loss_bias 42.486469 lr 0.00002004 rank 0 2023-10-17 18:55:57,183 DEBUG TRAIN Batch 0/600 loss 62.084000 loss_att 62.485199 loss_ctc 56.836884 loss_bias 43.109840 lr 0.00002404 rank 7 2023-10-17 18:55:57,186 DEBUG TRAIN Batch 0/600 loss 63.226624 loss_att 62.583645 loss_ctc 60.321804 loss_bias 44.050949 lr 0.00002404 rank 3
你是不是直接从头开始训练了,为了减少对原本asr性能的影响,我写的是从一个预训练好的asr模型开始训,除了热词模块之外的参数都给冻结了。从头开始训应该也能够收敛,但是至少得把冻结的参数先解冻。
没有,也是用的之前在librispeech上预训练好的asr模型,做了参数冻结
您好,我尝试复现您在librispeech的结果,但是在训练热词增强模型时,出现cv loss值不下降的情况(保持在160多),并且train loss也是下降到四五十就不太下降了。 另外,我发现每次训练几个batch时,都会花五六分钟去训练下一个batch,正常情况我的显卡每训练一个batch的时间是30s左右,下面是一小段训练日志。。。 我没修改任何代码,训练conf文件也是您提供那个train_bias, 能大概分析下出现问题的原因吗? 谢谢! 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 77.121582 loss_att 68.588936 loss_ctc 90.912209 loss_bias 61.188702 lr 0.00001204 rank 3 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 60.563221 loss_att 53.186646 loss_ctc 73.329880 loss_bias 44.453613 lr 0.00001204 rank 7 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 66.905380 loss_att 60.219139 loss_ctc 76.915077 loss_bias 55.915321 lr 0.00001204 rank 1 2023-10-17 18:48:14,599 DEBUG TRAIN Batch 0/300 loss 58.367058 loss_att 54.565548 loss_ctc 63.268948 loss_bias 39.683086 lr 0.00001204 rank 0 2023-10-17 18:48:54,507 DEBUG TRAIN Batch 0/400 loss 69.295921 loss_att 62.990799 loss_ctc 78.056396 loss_bias 59.514668 lr 0.00001604 rank 7 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 60.892227 loss_att 55.707409 loss_ctc 68.627617 loss_bias 43.625130 lr 0.00001604 rank 6 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 70.570961 loss_att 63.955940 loss_ctc 81.632156 loss_bias 43.738525 lr 0.00001604 rank 2 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 56.387531 loss_att 51.965221 loss_ctc 61.897652 loss_bias 48.085854 lr 0.00001604 rank 5 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 57.394482 loss_att 53.534023 loss_ctc 62.557728 loss_bias 38.444881 lr 0.00001604 rank 1 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 61.427593 loss_att 57.190033 loss_ctc 66.434952 loss_bias 48.802876 lr 0.00001604 rank 4 2023-10-17 18:48:54,513 DEBUG TRAIN Batch 0/400 loss 66.382660 loss_att 61.784157 loss_ctc 71.916908 loss_bias 51.955982 lr 0.00001604 rank 3 2023-10-17 18:48:54,517 DEBUG TRAIN Batch 0/400 loss 69.309433 loss_att 61.884018 loss_ctc 81.042137 loss_bias 55.932556 lr 0.00001604 rank 0 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 60.114948 loss_att 58.303940 loss_ctc 60.731007 loss_bias 36.096294 lr 0.00002004 rank 7 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 56.977654 loss_att 53.650196 loss_ctc 61.347378 loss_bias 33.943447 lr 0.00002004 rank 1 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 56.869381 loss_att 54.544899 loss_ctc 58.243603 loss_bias 40.495705 lr 0.00002004 rank 2 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 58.940693 loss_att 57.577057 loss_ctc 57.989662 loss_bias 41.328430 lr 0.00002004 rank 4 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 63.078079 loss_att 60.879333 loss_ctc 64.494652 loss_bias 37.138424 lr 0.00002004 rank 3 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 62.410076 loss_att 58.739368 loss_ctc 67.138695 loss_bias 38.363663 lr 0.00002004 rank 6 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 61.162239 loss_att 57.996552 loss_ctc 63.624905 loss_bias 49.239365 lr 0.00002004 rank 5 2023-10-17 18:55:16,909 DEBUG TRAIN Batch 0/500 loss 62.478779 loss_att 60.307823 loss_ctc 63.295692 loss_bias 42.486469 lr 0.00002004 rank 0 2023-10-17 18:55:57,183 DEBUG TRAIN Batch 0/600 loss 62.084000 loss_att 62.485199 loss_ctc 56.836884 loss_bias 43.109840 lr 0.00002404 rank 7 2023-10-17 18:55:57,186 DEBUG TRAIN Batch 0/600 loss 63.226624 loss_att 62.583645 loss_ctc 60.321804 loss_bias 44.050949 lr 0.00002404 rank 3
你是不是直接从头开始训练了,为了减少对原本asr性能的影响,我写的是从一个预训练好的asr模型开始训,除了热词模块之外的参数都给冻结了。从头开始训应该也能够收敛,但是至少得把冻结的参数先解冻。
没有,也是用的之前在librispeech上预训练好的asr模型,做了参数冻结
好的,我晚上再训一下试试能不能复现问题。训练速度的问题我之前也有观察到会导致每几百个batch就需要一小段时间去进行下一个batch,但整体训练速度上的差距没有你遇到的差异那么大,所以没有太注意,我后续也研究下
您好,我尝试复现您在librispeech的结果,但是在训练热词增强模型时,出现cv loss值不下降的情况(保持在160多),并且train loss也是下降到四五十就不太下降了。 另外,我发现每次训练几个batch时,都会花五六分钟去训练下一个batch,正常情况我的显卡每训练一个batch的时间是30s左右,下面是一小段训练日志。。。 我没修改任何代码,训练conf文件也是您提供那个train_bias, 能大概分析下出现问题的原因吗? 谢谢! 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 77.121582 loss_att 68.588936 loss_ctc 90.912209 loss_bias 61.188702 lr 0.00001204 rank 3 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 60.563221 loss_att 53.186646 loss_ctc 73.329880 loss_bias 44.453613 lr 0.00001204 rank 7 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 66.905380 loss_att 60.219139 loss_ctc 76.915077 loss_bias 55.915321 lr 0.00001204 rank 1 2023-10-17 18:48:14,599 DEBUG TRAIN Batch 0/300 loss 58.367058 loss_att 54.565548 loss_ctc 63.268948 loss_bias 39.683086 lr 0.00001204 rank 0 2023-10-17 18:48:54,507 DEBUG TRAIN Batch 0/400 loss 69.295921 loss_att 62.990799 loss_ctc 78.056396 loss_bias 59.514668 lr 0.00001604 rank 7 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 60.892227 loss_att 55.707409 loss_ctc 68.627617 loss_bias 43.625130 lr 0.00001604 rank 6 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 70.570961 loss_att 63.955940 loss_ctc 81.632156 loss_bias 43.738525 lr 0.00001604 rank 2 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 56.387531 loss_att 51.965221 loss_ctc 61.897652 loss_bias 48.085854 lr 0.00001604 rank 5 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 57.394482 loss_att 53.534023 loss_ctc 62.557728 loss_bias 38.444881 lr 0.00001604 rank 1 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 61.427593 loss_att 57.190033 loss_ctc 66.434952 loss_bias 48.802876 lr 0.00001604 rank 4 2023-10-17 18:48:54,513 DEBUG TRAIN Batch 0/400 loss 66.382660 loss_att 61.784157 loss_ctc 71.916908 loss_bias 51.955982 lr 0.00001604 rank 3 2023-10-17 18:48:54,517 DEBUG TRAIN Batch 0/400 loss 69.309433 loss_att 61.884018 loss_ctc 81.042137 loss_bias 55.932556 lr 0.00001604 rank 0 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 60.114948 loss_att 58.303940 loss_ctc 60.731007 loss_bias 36.096294 lr 0.00002004 rank 7 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 56.977654 loss_att 53.650196 loss_ctc 61.347378 loss_bias 33.943447 lr 0.00002004 rank 1 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 56.869381 loss_att 54.544899 loss_ctc 58.243603 loss_bias 40.495705 lr 0.00002004 rank 2 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 58.940693 loss_att 57.577057 loss_ctc 57.989662 loss_bias 41.328430 lr 0.00002004 rank 4 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 63.078079 loss_att 60.879333 loss_ctc 64.494652 loss_bias 37.138424 lr 0.00002004 rank 3 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 62.410076 loss_att 58.739368 loss_ctc 67.138695 loss_bias 38.363663 lr 0.00002004 rank 6 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 61.162239 loss_att 57.996552 loss_ctc 63.624905 loss_bias 49.239365 lr 0.00002004 rank 5 2023-10-17 18:55:16,909 DEBUG TRAIN Batch 0/500 loss 62.478779 loss_att 60.307823 loss_ctc 63.295692 loss_bias 42.486469 lr 0.00002004 rank 0 2023-10-17 18:55:57,183 DEBUG TRAIN Batch 0/600 loss 62.084000 loss_att 62.485199 loss_ctc 56.836884 loss_bias 43.109840 lr 0.00002404 rank 7 2023-10-17 18:55:57,186 DEBUG TRAIN Batch 0/600 loss 63.226624 loss_att 62.583645 loss_ctc 60.321804 loss_bias 44.050949 lr 0.00002404 rank 3
你是不是直接从头开始训练了,为了减少对原本asr性能的影响,我写的是从一个预训练好的asr模型开始训,除了热词模块之外的参数都给冻结了。从头开始训应该也能够收敛,但是至少得把冻结的参数先解冻。
没有,也是用的之前在librispeech上预训练好的asr模型,做了参数冻结
我试了下,用直接clone下来的代码+github上预训练的librispeech模型+我提供的yaml是可以正常收敛的,大概在1000个batch的时候loss就降到10了。会不会是你用的预训练asr模型和我提供的yaml里面某些参数对不上,导致模型随机初始化了一些参数并且还被冻结了。
您好,我尝试复现您在librispeech的结果,但是在训练热词增强模型时,出现cv loss值不下降的情况(保持在160多),并且train loss也是下降到四五十就不太下降了。 另外,我发现每次训练几个batch时,都会花五六分钟去训练下一个batch,正常情况我的显卡每训练一个batch的时间是30s左右,下面是一小段训练日志。。。 我没修改任何代码,训练conf文件也是您提供那个train_bias, 能大概分析下出现问题的原因吗? 谢谢! 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 77.121582 loss_att 68.588936 loss_ctc 90.912209 loss_bias 61.188702 lr 0.00001204 rank 3 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 60.563221 loss_att 53.186646 loss_ctc 73.329880 loss_bias 44.453613 lr 0.00001204 rank 7 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 66.905380 loss_att 60.219139 loss_ctc 76.915077 loss_bias 55.915321 lr 0.00001204 rank 1 2023-10-17 18:48:14,599 DEBUG TRAIN Batch 0/300 loss 58.367058 loss_att 54.565548 loss_ctc 63.268948 loss_bias 39.683086 lr 0.00001204 rank 0 2023-10-17 18:48:54,507 DEBUG TRAIN Batch 0/400 loss 69.295921 loss_att 62.990799 loss_ctc 78.056396 loss_bias 59.514668 lr 0.00001604 rank 7 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 60.892227 loss_att 55.707409 loss_ctc 68.627617 loss_bias 43.625130 lr 0.00001604 rank 6 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 70.570961 loss_att 63.955940 loss_ctc 81.632156 loss_bias 43.738525 lr 0.00001604 rank 2 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 56.387531 loss_att 51.965221 loss_ctc 61.897652 loss_bias 48.085854 lr 0.00001604 rank 5 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 57.394482 loss_att 53.534023 loss_ctc 62.557728 loss_bias 38.444881 lr 0.00001604 rank 1 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 61.427593 loss_att 57.190033 loss_ctc 66.434952 loss_bias 48.802876 lr 0.00001604 rank 4 2023-10-17 18:48:54,513 DEBUG TRAIN Batch 0/400 loss 66.382660 loss_att 61.784157 loss_ctc 71.916908 loss_bias 51.955982 lr 0.00001604 rank 3 2023-10-17 18:48:54,517 DEBUG TRAIN Batch 0/400 loss 69.309433 loss_att 61.884018 loss_ctc 81.042137 loss_bias 55.932556 lr 0.00001604 rank 0 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 60.114948 loss_att 58.303940 loss_ctc 60.731007 loss_bias 36.096294 lr 0.00002004 rank 7 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 56.977654 loss_att 53.650196 loss_ctc 61.347378 loss_bias 33.943447 lr 0.00002004 rank 1 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 56.869381 loss_att 54.544899 loss_ctc 58.243603 loss_bias 40.495705 lr 0.00002004 rank 2 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 58.940693 loss_att 57.577057 loss_ctc 57.989662 loss_bias 41.328430 lr 0.00002004 rank 4 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 63.078079 loss_att 60.879333 loss_ctc 64.494652 loss_bias 37.138424 lr 0.00002004 rank 3 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 62.410076 loss_att 58.739368 loss_ctc 67.138695 loss_bias 38.363663 lr 0.00002004 rank 6 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 61.162239 loss_att 57.996552 loss_ctc 63.624905 loss_bias 49.239365 lr 0.00002004 rank 5 2023-10-17 18:55:16,909 DEBUG TRAIN Batch 0/500 loss 62.478779 loss_att 60.307823 loss_ctc 63.295692 loss_bias 42.486469 lr 0.00002004 rank 0 2023-10-17 18:55:57,183 DEBUG TRAIN Batch 0/600 loss 62.084000 loss_att 62.485199 loss_ctc 56.836884 loss_bias 43.109840 lr 0.00002404 rank 7 2023-10-17 18:55:57,186 DEBUG TRAIN Batch 0/600 loss 63.226624 loss_att 62.583645 loss_ctc 60.321804 loss_bias 44.050949 lr 0.00002404 rank 3
你是不是直接从头开始训练了,为了减少对原本asr性能的影响,我写的是从一个预训练好的asr模型开始训,除了热词模块之外的参数都给冻结了。从头开始训应该也能够收敛,但是至少得把冻结的参数先解冻。
没有,也是用的之前在librispeech上预训练好的asr模型,做了参数冻结
我试了下,用直接clone下来的代码+github上预训练的librispeech模型+我提供的yaml是可以正常收敛的,大概在1000个batch的时候loss就降到10了。会不会是你用的预训练asr模型和我提供的yaml里面某些参数对不上,导致模型随机初始化了一些参数并且还被冻结了。
我试过重头开始训练是可以收敛的。。。。 对于预训练模型,我对比了和你的训练参数是一模一样的,这就奇怪了。。。。你用的是哪一个预训练的librispeech呢? 我再检查下原因
您好,我尝试复现您在librispeech的结果,但是在训练热词增强模型时,出现cv loss值不下降的情况(保持在160多),并且train loss也是下降到四五十就不太下降了。 另外,我发现每次训练几个batch时,都会花五六分钟去训练下一个batch,正常情况我的显卡每训练一个batch的时间是30s左右,下面是一小段训练日志。。。 我没修改任何代码,训练conf文件也是您提供那个train_bias, 能大概分析下出现问题的原因吗? 谢谢! 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 77.121582 loss_att 68.588936 loss_ctc 90.912209 loss_bias 61.188702 lr 0.00001204 rank 3 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 60.563221 loss_att 53.186646 loss_ctc 73.329880 loss_bias 44.453613 lr 0.00001204 rank 7 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 66.905380 loss_att 60.219139 loss_ctc 76.915077 loss_bias 55.915321 lr 0.00001204 rank 1 2023-10-17 18:48:14,599 DEBUG TRAIN Batch 0/300 loss 58.367058 loss_att 54.565548 loss_ctc 63.268948 loss_bias 39.683086 lr 0.00001204 rank 0 2023-10-17 18:48:54,507 DEBUG TRAIN Batch 0/400 loss 69.295921 loss_att 62.990799 loss_ctc 78.056396 loss_bias 59.514668 lr 0.00001604 rank 7 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 60.892227 loss_att 55.707409 loss_ctc 68.627617 loss_bias 43.625130 lr 0.00001604 rank 6 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 70.570961 loss_att 63.955940 loss_ctc 81.632156 loss_bias 43.738525 lr 0.00001604 rank 2 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 56.387531 loss_att 51.965221 loss_ctc 61.897652 loss_bias 48.085854 lr 0.00001604 rank 5 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 57.394482 loss_att 53.534023 loss_ctc 62.557728 loss_bias 38.444881 lr 0.00001604 rank 1 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 61.427593 loss_att 57.190033 loss_ctc 66.434952 loss_bias 48.802876 lr 0.00001604 rank 4 2023-10-17 18:48:54,513 DEBUG TRAIN Batch 0/400 loss 66.382660 loss_att 61.784157 loss_ctc 71.916908 loss_bias 51.955982 lr 0.00001604 rank 3 2023-10-17 18:48:54,517 DEBUG TRAIN Batch 0/400 loss 69.309433 loss_att 61.884018 loss_ctc 81.042137 loss_bias 55.932556 lr 0.00001604 rank 0 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 60.114948 loss_att 58.303940 loss_ctc 60.731007 loss_bias 36.096294 lr 0.00002004 rank 7 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 56.977654 loss_att 53.650196 loss_ctc 61.347378 loss_bias 33.943447 lr 0.00002004 rank 1 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 56.869381 loss_att 54.544899 loss_ctc 58.243603 loss_bias 40.495705 lr 0.00002004 rank 2 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 58.940693 loss_att 57.577057 loss_ctc 57.989662 loss_bias 41.328430 lr 0.00002004 rank 4 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 63.078079 loss_att 60.879333 loss_ctc 64.494652 loss_bias 37.138424 lr 0.00002004 rank 3 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 62.410076 loss_att 58.739368 loss_ctc 67.138695 loss_bias 38.363663 lr 0.00002004 rank 6 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 61.162239 loss_att 57.996552 loss_ctc 63.624905 loss_bias 49.239365 lr 0.00002004 rank 5 2023-10-17 18:55:16,909 DEBUG TRAIN Batch 0/500 loss 62.478779 loss_att 60.307823 loss_ctc 63.295692 loss_bias 42.486469 lr 0.00002004 rank 0 2023-10-17 18:55:57,183 DEBUG TRAIN Batch 0/600 loss 62.084000 loss_att 62.485199 loss_ctc 56.836884 loss_bias 43.109840 lr 0.00002404 rank 7 2023-10-17 18:55:57,186 DEBUG TRAIN Batch 0/600 loss 63.226624 loss_att 62.583645 loss_ctc 60.321804 loss_bias 44.050949 lr 0.00002404 rank 3
你是不是直接从头开始训练了,为了减少对原本asr性能的影响,我写的是从一个预训练好的asr模型开始训,除了热词模块之外的参数都给冻结了。从头开始训应该也能够收敛,但是至少得把冻结的参数先解冻。
没有,也是用的之前在librispeech上预训练好的asr模型,做了参数冻结
我试了下,用直接clone下来的代码+github上预训练的librispeech模型+我提供的yaml是可以正常收敛的,大概在1000个batch的时候loss就降到10了。会不会是你用的预训练asr模型和我提供的yaml里面某些参数对不上,导致模型随机初始化了一些参数并且还被冻结了。
我试过重头开始训练是可以收敛的。。。。 对于预训练模型,我对比了和你的训练参数是一模一样的,这就奇怪了。。。。你用的是哪一个预训练的librispeech呢? 我再检查下原因
我用的就是wenet在github上提供下载的这个librispeech模型 https://github.com/wenet-e2e/wenet/blob/main/docs/pretrained_models.en.md
您好,我看您描述到 “context list for the test set is sourced from: https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias” 但我还是不是很清楚您如何获得这3838或100个热词的。是从words/all_rare_words.txt随机挑选的吗还是通过别的什么方法呢?
您好,我看您描述到 “context list for the test set is sourced from: https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias” 但我还是不是很清楚您如何获得这3838或100个热词的。是从words/all_rare_words.txt随机挑选的吗还是通过别的什么方法呢?
ref目录下有为每条数据构建的固定大小的热词列表,其中包括真实热词和干扰项,我用的就是其中test_other数据集的100.tsv(要测试这个的话需要修改下热词列表的读取方式,为每条数据单独读取列表),3838是通过合并所有test_other数据的真实热词得到的总列表
您好,我看您描述到 “context list for the test set is sourced from: https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias” 但我还是不是很清楚您如何获得这3838或100个热词的。是从words/all_rare_words.txt随机挑选的吗还是通过别的什么方法呢?
ref目录下有为每条数据构建的固定大小的热词列表,其中包括真实热词和干扰项,我用的就是其中test_other数据集的100.tsv(要测试这个的话需要修改下热词列表的读取方式,为每条数据单独读取列表),3838是通过合并所有test_other数据的真实热词得到的总列表
我看懂了,那我想请问是否有开源的aishell的热词表呢
您好,我看您描述到 “context list for the test set is sourced from: https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias” 但我还是不是很清楚您如何获得这3838或100个热词的。是从words/all_rare_words.txt随机挑选的吗还是通过别的什么方法呢?
ref目录下有为每条数据构建的固定大小的热词列表,其中包括真实热词和干扰项,我用的就是其中test_other数据集的100.tsv(要测试这个的话需要修改下热词列表的读取方式,为每条数据单独读取列表),3838是通过合并所有test_other数据的真实热词得到的总列表
我看懂了,那我想请问是否有开源的aishell的热词表呢
modelscope上有一个aishell1的,https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_hotwords_testsets/summary
您好,我看您描述到 “context list for the test set is sourced from: https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias” 但我还是不是很清楚您如何获得这3838或100个热词的。是从words/all_rare_words.txt随机挑选的吗还是通过别的什么方法呢?
ref目录下有为每条数据构建的固定大小的热词列表,其中包括真实热词和干扰项,我用的就是其中test_other数据集的100.tsv(要测试这个的话需要修改下热词列表的读取方式,为每条数据单独读取列表),3838是通过合并所有test_other数据的真实热词得到的总列表
我看懂了,那我想请问是否有开源的aishell的热词表呢
modelscope上有一个aishell1的,https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_hotwords_testsets/summary
非常感谢
@kaixunhuang0 你好凯勋,这是我在test_other 上的测试结果。
我有几个问题:
- baseline 8.77 是和你的一致的,但是使用你训练好的热词模型也是这个结果? 即 Wenet_Librispeech_deep_biasing/nn_bias/bias_model 和 librispeech_u2pp_conformer_exp 测的结果为啥没有区别?
- 没有区别可能是nn bias没有生效?那怎样测试才能生效呢?我试了runtime 和python的测试,都未生效。
- context_2.0_test_other 测的结果是8.13 。这里是使用的3838条词表,context score=2.0 测的,应该和你之前发的 new runtime context graph 是对的上的。
- 你有合适的计算 U-WER、B-WER的工具吗?我用的是is21_deep_bias/score.py ,但跟你的结果有点区别,我想和你的统一。
@kaixunhuang0 你好凯勋,这是我在test_other 上的测试结果。
我有几个问题:
- baseline 8.77 是和你的一致的,但是使用你训练好的热词模型也是这个结果? 即 Wenet_Librispeech_deep_biasing/nn_bias/bias_model 和 librispeech_u2pp_conformer_exp 测的结果为啥没有区别?
- 没有区别可能是nn bias没有生效?那怎样测试才能生效呢?我试了runtime 和python的测试,都未生效。
- context_2.0_test_other 测的结果是8.13 。这里是使用的3838条词表,context score=2.0 测的,应该和你之前发的 new runtime context graph 是对的上的。
- 你有合适的计算 U-WER、B-WER的工具吗?我用的是is21_deep_bias/score.py ,但跟你的结果有点区别,我想和你的统一。
- 如果你说的问题是在没开启热词的情况下都是8.77,这是因为我训练nn热词模型的时候,原本的asr模型是冻结的,所以用我那个训好的模型测没开启热词的情况会和baseline一模一样
- 如果是你尝试测试nn热词结果还是8.77,也许是你解码的某个参数传的不太对?你可以检查下context_bias_mode是否设置成了deep_biasing,如果要得到上面表里的结果,还需要把热词筛选context_filtering打开,并且设置context_filtering_threshold,我印象应该是-4
- 我这边计算bwer的代码是自个写的,肯定没他写的好,而且比较乱,如果你需要的话我今晚或者明晚整一下传到huggingface上边
- 如果你说的问题是在没开启热词的情况下都是 8.77,这是因为我训练 nn 热词模型的时候,原本的 asr 模型是冻结的,所以用我那个训好的模型测没开启热词的情况会和 baseline 一模一样
- 如果是你尝试测试 nn 热词结果还是 8.77,也许是你解码的某个参数传的不太对?你可以检查下 context_bias_mode 是否设置成了 deep_biasing,如果要得到上面表里的结果,还需要把热词筛选 context_filtering 打开,并且设置 context_filtering_threshold,我印象应该是 - 4
- 我这边计算 bwer 的代码是自个写的,肯定没他写的好,而且比较乱,如果你需要的话我今晚或者明晚整一下传到 huggingface 上边
- 好的,那我确实是没有改动参数,那应该是冻结了,我试一下 改改context_bias_mode 和 context_filtering 。
- 好的,那麻烦你传一下bwer了,主要是想跟你对齐。。
@kaixunhuang0 看到了,在调用wenet/bin/recognize.py 解码时,传入context_bias_mode 的相关参数,选择context_graph 还是 deep_biasing,两个都传的话,就是两个热词的方式都用,对吧? 还有一个问题是,runtime解码只集成了context_graph的方式,deep_biasing还没有集成吧?
@kaixunhuang0 看到了,在调用wenet/bin/recognize.py 解码时,传入context_bias_mode 的相关参数,选择context_graph 还是 deep_biasing,两个都传的话,就是两个热词的方式都用,对吧? 还有一个问题是,runtime解码只集成了context_graph的方式,deep_biasing还没有集成吧?
是的,两个都传就是两个同时使用,runtime还没有继承deep_biasing
@kaixunhuang0 我发现你还集成了nn bias 的两阶段过滤算法,很赞啊。
那我又有问题了,deep biasing带来的提升8.77 -> 7.93 ,two_stage_filtering 的贡献是多少? 哦,当然我也可以自己测,我跑了看看。 论文上看 缩减词表太猛了,特别适合大词表的热词方案。
@kaixunhuang0 我发现你还集成了nn bias 的两阶段过滤算法,很赞啊。
那我又有问题了,deep biasing带来的提升8.77 -> 7.93 ,two_stage_filtering 的贡献是多少? 哦,当然我也可以自己测,我跑了看看。 论文上看 缩减词表太猛了,特别适合大词表的热词方案。
对于nnbias来说,几千的热词列表就算很大的了,如果不带热词筛选的话可能从整体wer上看用nnbias可能就没什么提升了,不过bwer应该还是会有所改进。基本上热词列表到几百这个量级最好就要开热词筛选。
@kaixunhuang0 我是用 Wenet_Librispeech_deep_biasing/nn_bias/bias_model 模型 和 Wenet_Librispeech_deep_biasing/nn_bias/bias_model/test_other_context_list 词表进行解码对的,解码配置只设置了 --context_bias_mode "deep_biasing,context_graph" --context_list_path $context_path --context_filtering ,其余的--context_graph_score、--deep_biasing_score、--context_filtering_threshold 都用的默认值,我这个的趋势是跟你一样的,也是两个都开效果最好,但是结果比你的要差一点,你觉得会是什么原因呢?还有其他的东西要改吗?
@kaixunhuang0 我是用 Wenet_Librispeech_deep_biasing/nn_bias/bias_model 模型 和 Wenet_Librispeech_deep_biasing/nn_bias/bias_model/test_other_context_list 词表进行解码对的,解码配置只设置了 --context_bias_mode "deep_biasing,context_graph" --context_list_path $context_path --context_filtering ,其余的--context_graph_score、--deep_biasing_score、--context_filtering_threshold 都用的默认值,我这个的趋势是跟你一样的,也是两个都开效果最好,但是结果比你的要差一点,你觉得会是什么原因呢?还有其他的东西要改吗?
试一下改成我表格里的graph score和biasing score,我测试的时候也是用的这份代码,这些输入都对齐的情况下应该结果是能一致的


那我又有问题了,deep biasing带来的提升8.77 -> 7.93 ,two_stage_filtering 的贡献是多少? 哦,当然我也可以自己测,我跑了看看。 论文上看 缩减词表太猛了,特别适合大词表的热词方案。
@kaixunhuang0 我是用 Wenet_Librispeech_deep_biasing/nn_bias/bias_model 模型 和 Wenet_Librispeech_deep_biasing/nn_bias/bias_model/test_other_context_list 词表进行解码对的,解码配置只设置了 --context_bias_mode "deep_biasing,context_graph" --context_list_path $context_path --context_filtering ,其余的--context_graph_score、--deep_biasing_score、--context_filtering_threshold 都用的默认值,我这个的趋势是跟你一样的,也是两个都开效果最好,但是结果比你的要差一点,你觉得会是什么原因呢?还有其他的东西要改吗?