wenet icon indicating copy to clipboard operation
wenet copied to clipboard

[wenet] nn context biasing

Open kaixunhuang0 opened this issue 2 years ago • 104 comments

The Deep biasing method comes from: https://arxiv.org/abs/2305.12493

The pre-trained ASR model is fine-tuned to achieve biasing. During the training process, the original parameters of the ASR model are frozen, and only the parameters related to deep biasing are trained. use_dynamic_chunk cannot be enabled during fine-tuning (the biasing effect will decrease), but the biasing effects of streaming and non-streaming inference are basically the same.

RESULT: Model link: https://huggingface.co/kxhuang/Wenet_Librispeech_deep_biasing/tree/main (I used the BLSTM forward state incorrectly when training this model, so to test this model you need to change the -2 to 0 in the forward function of the BLSTM class in wenet/transformer/context_module.py)

Using the Wenet Librispeech pre-trained AED model, after fine-tuning for 30 epochs, the final model was obtained with an average of 3 epochs. The following are the test results of the Librispeech test other. The context list for the test set is sourced from: https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias

Non-streaming inference:

Method List size Graph score Biasing score WER U-WER B-WER
baseline / / / 8.77 5.58 36.84
context graph 3838 3.0 / 7.75 5.83 24.62
deep biasing 3838 / 1.5 7.93 5.92 25.64
context graph
+ deep biasing
3838 2.0 1.0 7.66 6.08 21.48
context graph 100 3.0 / 7.32 5.45 23.70
deep biasing 100 / 2.0 7.08 5.33 22.41
context graph
+ deep biasing
100 2.5 1.5 6.55 5.33 17.27

Streaming inference (chunk 16):

Method List size Graph score Biasing score WER U-WER B-WER
baseline / / / 10.47 7.07 40.30
context graph 100 3.0 / 9.06 6.99 27.21
deep biasing 100 / 2.0 8.86 6.87 26.28
context graph
+ deep biasing
100 2.5 1.5 8.17 6.85 19.72

kaixunhuang0 avatar Aug 31 '23 07:08 kaixunhuang0

可以提供一些模型训练时候的conf.yaml参数设置吗?谢谢

zyjcsf avatar Sep 28 '23 07:09 zyjcsf

可以提供一些模型训练时的conf.yaml参数设置吗?谢谢

上面的模型链接中有我用的yaml文件,可以直接下载

kaixunhuang0 avatar Sep 28 '23 07:09 kaixunhuang0

我想请问下,我在aishell170小时上训练了deep biasing的模型,但是在解码的时候如果设置deep biasing,会出现很多的漏字现象,这个会是什么原因呀?

zyjcsf avatar Oct 09 '23 08:10 zyjcsf

我想请问下,我在aishell170小时上训练了deep biasing的模型,但是在解码的时候如果设置deep biasing,会出现很多的漏字现象,这个会是什么原因呀?

漏字的现象很严重吗,使用的热词列表大小多大?我这边也有做过aishell1的实验,结果比较正常,没有观察到漏字的现象

kaixunhuang0 avatar Oct 09 '23 08:10 kaixunhuang0

我想请问下,我在aishell170小时上训练了deep biasing的模型,但是在解码的时候如果设置deep biasing,会出现很多的漏字现象,这个会是什么原因呀?

漏字的现象很严重吗,使用的热词列表大小多大?我这边也有做过aishell1的实验,结果比较正常,没有观察到漏字的现象

很严重,就是一段一段的漏,原始设置的热词表大小是187,modelscope上开源的热词测试集,然后是设置了context_filtering参数进行过滤,如果过滤后热词表只有【0】的话,基本上就整句话漏了,如果是有热词的情况,也会出现成片漏掉的情况,设置的deep_score=1,filter_threshold=-4。目前训练迭代了17个epoch,loss_bias在10左右

zyjcsf avatar Oct 09 '23 08:10 zyjcsf

我想请问下,我在aishell170小时上训练了deep biasing的模型,但是在解码的时候如果设置deep biasing,会出现很多的漏字现象,这个会是什么原因呀?

漏字的现象很严重吗,使用的热词列表大小多大?我这边也有做过aishell1的实验,结果比较正常,没有观察到漏字的现象

很严重,就是一段一段的漏,原始设置的热词表大小是187,modelscope上开源的热词测试集,然后是设置了context_filtering参数进行过滤,如果过滤后热词表只有【0】的话,基本上就整句话漏了,如果是有热词的情况,也会出现成片漏掉的情况,设置的deep_score=1,filter_threshold=-4

那确实很奇怪,总体loss的情况正常吗,正常情况下收敛到差不多的时候,bias loss应该是和ctc loss差不多,总体的loss应该会比没有训练热词模块之前更低一些,在aishell上大概是3.4左右。你用的热词相关的yaml配置是否都和我上面给出的一致

kaixunhuang0 avatar Oct 09 '23 09:10 kaixunhuang0

我想请问下,我在aishell170小时上训练了deep biasing的模型,但是在解码的时候如果设置deep biasing,会出现很多的漏字现象,这个会是什么原因呀?

还有就是我在做aishell1实验的时候发现对于aishell1这种句子大部分都很短的数据集,热词采样的代码需要去掉那个判断采样热词不能交叉的逻辑,不然很容易一句话只能采样出一个热词,这样训出来热词增强的效果会差一些,不过这个问题并不会导致漏字的情况。

kaixunhuang0 avatar Oct 09 '23 09:10 kaixunhuang0

我想请问下,我在aishell170小时上训练了deep biasing的模型,但是在解码的时候如果设置deep biasing,会出现很多的漏字现象,这个会是什么原因呀?

漏字的现象很严重吗,使用的热词列表大小多大?我这边也有做过aishell1的实验,结果比较正常,没有观察到漏字的现象

很严重,就是一段一段的漏,原始设置的热词表大小是187,modelscope上开源的热词测试集,然后是设置了context_filtering参数进行过滤,如果过滤后热词表只有【0】的话,基本上就整句话漏了,如果是有热词的情况,也会出现成片漏掉的情况,设置的deep_score=1,filter_threshold=-4

那确实很奇怪,总体loss的情况正常吗,正常情况下收敛到差不多的时候,bias loss应该是和ctc loss差不多,总体的loss应该会比没有训练热词模块之前更低一些,在aishell上大概是3.4左右。你用的热词相关的yaml配置是否都和我上面给出的一致

目前训练出来整体的loss还算是正常,从3.1下降到了2.5,bias loss会比ctc loss高一些。我现在的热词配置就是您给的这个哈

zyjcsf avatar Oct 09 '23 09:10 zyjcsf

我想请问下,我在aishell170小时上训练了deep biasing的模型,但是在解码的时候如果设置deep biasing,会出现很多的漏字现象,这个会是什么原因呀?

漏字的现象很严重吗,使用的热词列表大小多大?我这边也有做过aishell1的实验,结果比较正常,没有观察到漏字的现象

很严重,就是一段一段的漏,原始设置的热词表大小是187,modelscope上开源的热词测试集,然后是设置了context_filtering参数进行过滤,如果过滤后热词表只有【0】的话,基本上就整句话漏了,如果是有热词的情况,也会出现成片漏掉的情况,设置的deep_score=1,filter_threshold=-4

那确实很奇怪,总体loss的情况正常吗,正常情况下收敛到差不多的时候,bias loss应该是和ctc loss差不多,总体的loss应该会比没有训练热词模块之前更低一些,在aishell上大概是3.4左右。你用的热词相关的yaml配置是否都和我上面给出的一致

目前训练出来整体的loss还算是正常,从3.1下降到了2.5,bias loss会比ctc loss高一些。我现在的热词配置就是您给的这个哈

会不会是你修改的热词采样部分的代码有点问题,我这边确实没遇到过你描述的状况,也想不出是什么原因,漏字而且还和传入的热词数量有关,理论上来说热词列表只剩个0应该对于正常解码的影响是最小的

kaixunhuang0 avatar Oct 09 '23 09:10 kaixunhuang0

您好,我尝试复现您在librispeech的结果,但是在训练热词增强模型时,出现cv loss值不下降的情况(保持在160多),并且train loss也是下降到四五十就不太下降了。 另外,我发现每次训练几个batch时,都会花五六分钟去训练下一个batch,正常情况我的显卡每训练一个batch的时间是30s左右,下面是一小段训练日志。。。

我没修改任何代码,训练conf文件也是您提供那个train_bias, 能大概分析下出现问题的原因吗? 谢谢!

2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 77.121582 loss_att 68.588936 loss_ctc 90.912209 loss_bias 61.188702 lr 0.00001204 rank 3 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 60.563221 loss_att 53.186646 loss_ctc 73.329880 loss_bias 44.453613 lr 0.00001204 rank 7 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 66.905380 loss_att 60.219139 loss_ctc 76.915077 loss_bias 55.915321 lr 0.00001204 rank 1 2023-10-17 18:48:14,599 DEBUG TRAIN Batch 0/300 loss 58.367058 loss_att 54.565548 loss_ctc 63.268948 loss_bias 39.683086 lr 0.00001204 rank 0 2023-10-17 18:48:54,507 DEBUG TRAIN Batch 0/400 loss 69.295921 loss_att 62.990799 loss_ctc 78.056396 loss_bias 59.514668 lr 0.00001604 rank 7 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 60.892227 loss_att 55.707409 loss_ctc 68.627617 loss_bias 43.625130 lr 0.00001604 rank 6 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 70.570961 loss_att 63.955940 loss_ctc 81.632156 loss_bias 43.738525 lr 0.00001604 rank 2 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 56.387531 loss_att 51.965221 loss_ctc 61.897652 loss_bias 48.085854 lr 0.00001604 rank 5 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 57.394482 loss_att 53.534023 loss_ctc 62.557728 loss_bias 38.444881 lr 0.00001604 rank 1 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 61.427593 loss_att 57.190033 loss_ctc 66.434952 loss_bias 48.802876 lr 0.00001604 rank 4 2023-10-17 18:48:54,513 DEBUG TRAIN Batch 0/400 loss 66.382660 loss_att 61.784157 loss_ctc 71.916908 loss_bias 51.955982 lr 0.00001604 rank 3 2023-10-17 18:48:54,517 DEBUG TRAIN Batch 0/400 loss 69.309433 loss_att 61.884018 loss_ctc 81.042137 loss_bias 55.932556 lr 0.00001604 rank 0 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 60.114948 loss_att 58.303940 loss_ctc 60.731007 loss_bias 36.096294 lr 0.00002004 rank 7 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 56.977654 loss_att 53.650196 loss_ctc 61.347378 loss_bias 33.943447 lr 0.00002004 rank 1 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 56.869381 loss_att 54.544899 loss_ctc 58.243603 loss_bias 40.495705 lr 0.00002004 rank 2 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 58.940693 loss_att 57.577057 loss_ctc 57.989662 loss_bias 41.328430 lr 0.00002004 rank 4 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 63.078079 loss_att 60.879333 loss_ctc 64.494652 loss_bias 37.138424 lr 0.00002004 rank 3 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 62.410076 loss_att 58.739368 loss_ctc 67.138695 loss_bias 38.363663 lr 0.00002004 rank 6 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 61.162239 loss_att 57.996552 loss_ctc 63.624905 loss_bias 49.239365 lr 0.00002004 rank 5 2023-10-17 18:55:16,909 DEBUG TRAIN Batch 0/500 loss 62.478779 loss_att 60.307823 loss_ctc 63.295692 loss_bias 42.486469 lr 0.00002004 rank 0 2023-10-17 18:55:57,183 DEBUG TRAIN Batch 0/600 loss 62.084000 loss_att 62.485199 loss_ctc 56.836884 loss_bias 43.109840 lr 0.00002404 rank 7 2023-10-17 18:55:57,186 DEBUG TRAIN Batch 0/600 loss 63.226624 loss_att 62.583645 loss_ctc 60.321804 loss_bias 44.050949 lr 0.00002404 rank 3

wpupup avatar Oct 18 '23 02:10 wpupup

您好,我尝试复现您在librispeech的结果,但是在训练热词增强模型时,出现cv loss值不下降的情况(保持在160多),并且train loss也是下降到四五十就不太下降了。 另外,我发现每次训练几个batch时,都会花五六分钟去训练下一个batch,正常情况我的显卡每训练一个batch的时间是30s左右,下面是一小段训练日志。。。

我没修改任何代码,训练conf文件也是您提供那个train_bias, 能大概分析下出现问题的原因吗? 谢谢!

2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 77.121582 loss_att 68.588936 loss_ctc 90.912209 loss_bias 61.188702 lr 0.00001204 rank 3 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 60.563221 loss_att 53.186646 loss_ctc 73.329880 loss_bias 44.453613 lr 0.00001204 rank 7 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 66.905380 loss_att 60.219139 loss_ctc 76.915077 loss_bias 55.915321 lr 0.00001204 rank 1 2023-10-17 18:48:14,599 DEBUG TRAIN Batch 0/300 loss 58.367058 loss_att 54.565548 loss_ctc 63.268948 loss_bias 39.683086 lr 0.00001204 rank 0 2023-10-17 18:48:54,507 DEBUG TRAIN Batch 0/400 loss 69.295921 loss_att 62.990799 loss_ctc 78.056396 loss_bias 59.514668 lr 0.00001604 rank 7 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 60.892227 loss_att 55.707409 loss_ctc 68.627617 loss_bias 43.625130 lr 0.00001604 rank 6 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 70.570961 loss_att 63.955940 loss_ctc 81.632156 loss_bias 43.738525 lr 0.00001604 rank 2 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 56.387531 loss_att 51.965221 loss_ctc 61.897652 loss_bias 48.085854 lr 0.00001604 rank 5 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 57.394482 loss_att 53.534023 loss_ctc 62.557728 loss_bias 38.444881 lr 0.00001604 rank 1 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 61.427593 loss_att 57.190033 loss_ctc 66.434952 loss_bias 48.802876 lr 0.00001604 rank 4 2023-10-17 18:48:54,513 DEBUG TRAIN Batch 0/400 loss 66.382660 loss_att 61.784157 loss_ctc 71.916908 loss_bias 51.955982 lr 0.00001604 rank 3 2023-10-17 18:48:54,517 DEBUG TRAIN Batch 0/400 loss 69.309433 loss_att 61.884018 loss_ctc 81.042137 loss_bias 55.932556 lr 0.00001604 rank 0 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 60.114948 loss_att 58.303940 loss_ctc 60.731007 loss_bias 36.096294 lr 0.00002004 rank 7 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 56.977654 loss_att 53.650196 loss_ctc 61.347378 loss_bias 33.943447 lr 0.00002004 rank 1 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 56.869381 loss_att 54.544899 loss_ctc 58.243603 loss_bias 40.495705 lr 0.00002004 rank 2 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 58.940693 loss_att 57.577057 loss_ctc 57.989662 loss_bias 41.328430 lr 0.00002004 rank 4 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 63.078079 loss_att 60.879333 loss_ctc 64.494652 loss_bias 37.138424 lr 0.00002004 rank 3 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 62.410076 loss_att 58.739368 loss_ctc 67.138695 loss_bias 38.363663 lr 0.00002004 rank 6 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 61.162239 loss_att 57.996552 loss_ctc 63.624905 loss_bias 49.239365 lr 0.00002004 rank 5 2023-10-17 18:55:16,909 DEBUG TRAIN Batch 0/500 loss 62.478779 loss_att 60.307823 loss_ctc 63.295692 loss_bias 42.486469 lr 0.00002004 rank 0 2023-10-17 18:55:57,183 DEBUG TRAIN Batch 0/600 loss 62.084000 loss_att 62.485199 loss_ctc 56.836884 loss_bias 43.109840 lr 0.00002404 rank 7 2023-10-17 18:55:57,186 DEBUG TRAIN Batch 0/600 loss 63.226624 loss_att 62.583645 loss_ctc 60.321804 loss_bias 44.050949 lr 0.00002404 rank 3

你是不是直接从头开始训练了,为了减少对原本asr性能的影响,我写的是从一个预训练好的asr模型开始训,除了热词模块之外的参数都给冻结了。从头开始训应该也能够收敛,但是至少得把冻结的参数先解冻。

kaixunhuang0 avatar Oct 18 '23 02:10 kaixunhuang0

您好,我尝试复现您在librispeech的结果,但是在训练热词增强模型时,出现cv loss值不下降的情况(保持在160多),并且train loss也是下降到四五十就不太下降了。 另外,我发现每次训练几个batch时,都会花五六分钟去训练下一个batch,正常情况我的显卡每训练一个batch的时间是30s左右,下面是一小段训练日志。。。 我没修改任何代码,训练conf文件也是您提供那个train_bias, 能大概分析下出现问题的原因吗? 谢谢! 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 77.121582 loss_att 68.588936 loss_ctc 90.912209 loss_bias 61.188702 lr 0.00001204 rank 3 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 60.563221 loss_att 53.186646 loss_ctc 73.329880 loss_bias 44.453613 lr 0.00001204 rank 7 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 66.905380 loss_att 60.219139 loss_ctc 76.915077 loss_bias 55.915321 lr 0.00001204 rank 1 2023-10-17 18:48:14,599 DEBUG TRAIN Batch 0/300 loss 58.367058 loss_att 54.565548 loss_ctc 63.268948 loss_bias 39.683086 lr 0.00001204 rank 0 2023-10-17 18:48:54,507 DEBUG TRAIN Batch 0/400 loss 69.295921 loss_att 62.990799 loss_ctc 78.056396 loss_bias 59.514668 lr 0.00001604 rank 7 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 60.892227 loss_att 55.707409 loss_ctc 68.627617 loss_bias 43.625130 lr 0.00001604 rank 6 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 70.570961 loss_att 63.955940 loss_ctc 81.632156 loss_bias 43.738525 lr 0.00001604 rank 2 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 56.387531 loss_att 51.965221 loss_ctc 61.897652 loss_bias 48.085854 lr 0.00001604 rank 5 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 57.394482 loss_att 53.534023 loss_ctc 62.557728 loss_bias 38.444881 lr 0.00001604 rank 1 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 61.427593 loss_att 57.190033 loss_ctc 66.434952 loss_bias 48.802876 lr 0.00001604 rank 4 2023-10-17 18:48:54,513 DEBUG TRAIN Batch 0/400 loss 66.382660 loss_att 61.784157 loss_ctc 71.916908 loss_bias 51.955982 lr 0.00001604 rank 3 2023-10-17 18:48:54,517 DEBUG TRAIN Batch 0/400 loss 69.309433 loss_att 61.884018 loss_ctc 81.042137 loss_bias 55.932556 lr 0.00001604 rank 0 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 60.114948 loss_att 58.303940 loss_ctc 60.731007 loss_bias 36.096294 lr 0.00002004 rank 7 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 56.977654 loss_att 53.650196 loss_ctc 61.347378 loss_bias 33.943447 lr 0.00002004 rank 1 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 56.869381 loss_att 54.544899 loss_ctc 58.243603 loss_bias 40.495705 lr 0.00002004 rank 2 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 58.940693 loss_att 57.577057 loss_ctc 57.989662 loss_bias 41.328430 lr 0.00002004 rank 4 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 63.078079 loss_att 60.879333 loss_ctc 64.494652 loss_bias 37.138424 lr 0.00002004 rank 3 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 62.410076 loss_att 58.739368 loss_ctc 67.138695 loss_bias 38.363663 lr 0.00002004 rank 6 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 61.162239 loss_att 57.996552 loss_ctc 63.624905 loss_bias 49.239365 lr 0.00002004 rank 5 2023-10-17 18:55:16,909 DEBUG TRAIN Batch 0/500 loss 62.478779 loss_att 60.307823 loss_ctc 63.295692 loss_bias 42.486469 lr 0.00002004 rank 0 2023-10-17 18:55:57,183 DEBUG TRAIN Batch 0/600 loss 62.084000 loss_att 62.485199 loss_ctc 56.836884 loss_bias 43.109840 lr 0.00002404 rank 7 2023-10-17 18:55:57,186 DEBUG TRAIN Batch 0/600 loss 63.226624 loss_att 62.583645 loss_ctc 60.321804 loss_bias 44.050949 lr 0.00002404 rank 3

你是不是直接从头开始训练了,为了减少对原本asr性能的影响,我写的是从一个预训练好的asr模型开始训,除了热词模块之外的参数都给冻结了。从头开始训应该也能够收敛,但是至少得把冻结的参数先解冻。

没有,也是用的之前在librispeech上预训练好的asr模型,做了参数冻结

wpupup avatar Oct 18 '23 03:10 wpupup

您好,我尝试复现您在librispeech的结果,但是在训练热词增强模型时,出现cv loss值不下降的情况(保持在160多),并且train loss也是下降到四五十就不太下降了。 另外,我发现每次训练几个batch时,都会花五六分钟去训练下一个batch,正常情况我的显卡每训练一个batch的时间是30s左右,下面是一小段训练日志。。。 我没修改任何代码,训练conf文件也是您提供那个train_bias, 能大概分析下出现问题的原因吗? 谢谢! 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 77.121582 loss_att 68.588936 loss_ctc 90.912209 loss_bias 61.188702 lr 0.00001204 rank 3 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 60.563221 loss_att 53.186646 loss_ctc 73.329880 loss_bias 44.453613 lr 0.00001204 rank 7 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 66.905380 loss_att 60.219139 loss_ctc 76.915077 loss_bias 55.915321 lr 0.00001204 rank 1 2023-10-17 18:48:14,599 DEBUG TRAIN Batch 0/300 loss 58.367058 loss_att 54.565548 loss_ctc 63.268948 loss_bias 39.683086 lr 0.00001204 rank 0 2023-10-17 18:48:54,507 DEBUG TRAIN Batch 0/400 loss 69.295921 loss_att 62.990799 loss_ctc 78.056396 loss_bias 59.514668 lr 0.00001604 rank 7 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 60.892227 loss_att 55.707409 loss_ctc 68.627617 loss_bias 43.625130 lr 0.00001604 rank 6 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 70.570961 loss_att 63.955940 loss_ctc 81.632156 loss_bias 43.738525 lr 0.00001604 rank 2 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 56.387531 loss_att 51.965221 loss_ctc 61.897652 loss_bias 48.085854 lr 0.00001604 rank 5 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 57.394482 loss_att 53.534023 loss_ctc 62.557728 loss_bias 38.444881 lr 0.00001604 rank 1 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 61.427593 loss_att 57.190033 loss_ctc 66.434952 loss_bias 48.802876 lr 0.00001604 rank 4 2023-10-17 18:48:54,513 DEBUG TRAIN Batch 0/400 loss 66.382660 loss_att 61.784157 loss_ctc 71.916908 loss_bias 51.955982 lr 0.00001604 rank 3 2023-10-17 18:48:54,517 DEBUG TRAIN Batch 0/400 loss 69.309433 loss_att 61.884018 loss_ctc 81.042137 loss_bias 55.932556 lr 0.00001604 rank 0 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 60.114948 loss_att 58.303940 loss_ctc 60.731007 loss_bias 36.096294 lr 0.00002004 rank 7 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 56.977654 loss_att 53.650196 loss_ctc 61.347378 loss_bias 33.943447 lr 0.00002004 rank 1 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 56.869381 loss_att 54.544899 loss_ctc 58.243603 loss_bias 40.495705 lr 0.00002004 rank 2 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 58.940693 loss_att 57.577057 loss_ctc 57.989662 loss_bias 41.328430 lr 0.00002004 rank 4 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 63.078079 loss_att 60.879333 loss_ctc 64.494652 loss_bias 37.138424 lr 0.00002004 rank 3 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 62.410076 loss_att 58.739368 loss_ctc 67.138695 loss_bias 38.363663 lr 0.00002004 rank 6 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 61.162239 loss_att 57.996552 loss_ctc 63.624905 loss_bias 49.239365 lr 0.00002004 rank 5 2023-10-17 18:55:16,909 DEBUG TRAIN Batch 0/500 loss 62.478779 loss_att 60.307823 loss_ctc 63.295692 loss_bias 42.486469 lr 0.00002004 rank 0 2023-10-17 18:55:57,183 DEBUG TRAIN Batch 0/600 loss 62.084000 loss_att 62.485199 loss_ctc 56.836884 loss_bias 43.109840 lr 0.00002404 rank 7 2023-10-17 18:55:57,186 DEBUG TRAIN Batch 0/600 loss 63.226624 loss_att 62.583645 loss_ctc 60.321804 loss_bias 44.050949 lr 0.00002404 rank 3

你是不是直接从头开始训练了,为了减少对原本asr性能的影响,我写的是从一个预训练好的asr模型开始训,除了热词模块之外的参数都给冻结了。从头开始训应该也能够收敛,但是至少得把冻结的参数先解冻。

没有,也是用的之前在librispeech上预训练好的asr模型,做了参数冻结

好的,我晚上再训一下试试能不能复现问题。训练速度的问题我之前也有观察到会导致每几百个batch就需要一小段时间去进行下一个batch,但整体训练速度上的差距没有你遇到的差异那么大,所以没有太注意,我后续也研究下

kaixunhuang0 avatar Oct 18 '23 03:10 kaixunhuang0

您好,我尝试复现您在librispeech的结果,但是在训练热词增强模型时,出现cv loss值不下降的情况(保持在160多),并且train loss也是下降到四五十就不太下降了。 另外,我发现每次训练几个batch时,都会花五六分钟去训练下一个batch,正常情况我的显卡每训练一个batch的时间是30s左右,下面是一小段训练日志。。。 我没修改任何代码,训练conf文件也是您提供那个train_bias, 能大概分析下出现问题的原因吗? 谢谢! 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 77.121582 loss_att 68.588936 loss_ctc 90.912209 loss_bias 61.188702 lr 0.00001204 rank 3 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 60.563221 loss_att 53.186646 loss_ctc 73.329880 loss_bias 44.453613 lr 0.00001204 rank 7 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 66.905380 loss_att 60.219139 loss_ctc 76.915077 loss_bias 55.915321 lr 0.00001204 rank 1 2023-10-17 18:48:14,599 DEBUG TRAIN Batch 0/300 loss 58.367058 loss_att 54.565548 loss_ctc 63.268948 loss_bias 39.683086 lr 0.00001204 rank 0 2023-10-17 18:48:54,507 DEBUG TRAIN Batch 0/400 loss 69.295921 loss_att 62.990799 loss_ctc 78.056396 loss_bias 59.514668 lr 0.00001604 rank 7 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 60.892227 loss_att 55.707409 loss_ctc 68.627617 loss_bias 43.625130 lr 0.00001604 rank 6 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 70.570961 loss_att 63.955940 loss_ctc 81.632156 loss_bias 43.738525 lr 0.00001604 rank 2 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 56.387531 loss_att 51.965221 loss_ctc 61.897652 loss_bias 48.085854 lr 0.00001604 rank 5 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 57.394482 loss_att 53.534023 loss_ctc 62.557728 loss_bias 38.444881 lr 0.00001604 rank 1 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 61.427593 loss_att 57.190033 loss_ctc 66.434952 loss_bias 48.802876 lr 0.00001604 rank 4 2023-10-17 18:48:54,513 DEBUG TRAIN Batch 0/400 loss 66.382660 loss_att 61.784157 loss_ctc 71.916908 loss_bias 51.955982 lr 0.00001604 rank 3 2023-10-17 18:48:54,517 DEBUG TRAIN Batch 0/400 loss 69.309433 loss_att 61.884018 loss_ctc 81.042137 loss_bias 55.932556 lr 0.00001604 rank 0 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 60.114948 loss_att 58.303940 loss_ctc 60.731007 loss_bias 36.096294 lr 0.00002004 rank 7 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 56.977654 loss_att 53.650196 loss_ctc 61.347378 loss_bias 33.943447 lr 0.00002004 rank 1 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 56.869381 loss_att 54.544899 loss_ctc 58.243603 loss_bias 40.495705 lr 0.00002004 rank 2 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 58.940693 loss_att 57.577057 loss_ctc 57.989662 loss_bias 41.328430 lr 0.00002004 rank 4 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 63.078079 loss_att 60.879333 loss_ctc 64.494652 loss_bias 37.138424 lr 0.00002004 rank 3 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 62.410076 loss_att 58.739368 loss_ctc 67.138695 loss_bias 38.363663 lr 0.00002004 rank 6 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 61.162239 loss_att 57.996552 loss_ctc 63.624905 loss_bias 49.239365 lr 0.00002004 rank 5 2023-10-17 18:55:16,909 DEBUG TRAIN Batch 0/500 loss 62.478779 loss_att 60.307823 loss_ctc 63.295692 loss_bias 42.486469 lr 0.00002004 rank 0 2023-10-17 18:55:57,183 DEBUG TRAIN Batch 0/600 loss 62.084000 loss_att 62.485199 loss_ctc 56.836884 loss_bias 43.109840 lr 0.00002404 rank 7 2023-10-17 18:55:57,186 DEBUG TRAIN Batch 0/600 loss 63.226624 loss_att 62.583645 loss_ctc 60.321804 loss_bias 44.050949 lr 0.00002404 rank 3

你是不是直接从头开始训练了,为了减少对原本asr性能的影响,我写的是从一个预训练好的asr模型开始训,除了热词模块之外的参数都给冻结了。从头开始训应该也能够收敛,但是至少得把冻结的参数先解冻。

没有,也是用的之前在librispeech上预训练好的asr模型,做了参数冻结

我试了下,用直接clone下来的代码+github上预训练的librispeech模型+我提供的yaml是可以正常收敛的,大概在1000个batch的时候loss就降到10了。会不会是你用的预训练asr模型和我提供的yaml里面某些参数对不上,导致模型随机初始化了一些参数并且还被冻结了。

kaixunhuang0 avatar Oct 19 '23 01:10 kaixunhuang0

您好,我尝试复现您在librispeech的结果,但是在训练热词增强模型时,出现cv loss值不下降的情况(保持在160多),并且train loss也是下降到四五十就不太下降了。 另外,我发现每次训练几个batch时,都会花五六分钟去训练下一个batch,正常情况我的显卡每训练一个batch的时间是30s左右,下面是一小段训练日志。。。 我没修改任何代码,训练conf文件也是您提供那个train_bias, 能大概分析下出现问题的原因吗? 谢谢! 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 77.121582 loss_att 68.588936 loss_ctc 90.912209 loss_bias 61.188702 lr 0.00001204 rank 3 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 60.563221 loss_att 53.186646 loss_ctc 73.329880 loss_bias 44.453613 lr 0.00001204 rank 7 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 66.905380 loss_att 60.219139 loss_ctc 76.915077 loss_bias 55.915321 lr 0.00001204 rank 1 2023-10-17 18:48:14,599 DEBUG TRAIN Batch 0/300 loss 58.367058 loss_att 54.565548 loss_ctc 63.268948 loss_bias 39.683086 lr 0.00001204 rank 0 2023-10-17 18:48:54,507 DEBUG TRAIN Batch 0/400 loss 69.295921 loss_att 62.990799 loss_ctc 78.056396 loss_bias 59.514668 lr 0.00001604 rank 7 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 60.892227 loss_att 55.707409 loss_ctc 68.627617 loss_bias 43.625130 lr 0.00001604 rank 6 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 70.570961 loss_att 63.955940 loss_ctc 81.632156 loss_bias 43.738525 lr 0.00001604 rank 2 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 56.387531 loss_att 51.965221 loss_ctc 61.897652 loss_bias 48.085854 lr 0.00001604 rank 5 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 57.394482 loss_att 53.534023 loss_ctc 62.557728 loss_bias 38.444881 lr 0.00001604 rank 1 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 61.427593 loss_att 57.190033 loss_ctc 66.434952 loss_bias 48.802876 lr 0.00001604 rank 4 2023-10-17 18:48:54,513 DEBUG TRAIN Batch 0/400 loss 66.382660 loss_att 61.784157 loss_ctc 71.916908 loss_bias 51.955982 lr 0.00001604 rank 3 2023-10-17 18:48:54,517 DEBUG TRAIN Batch 0/400 loss 69.309433 loss_att 61.884018 loss_ctc 81.042137 loss_bias 55.932556 lr 0.00001604 rank 0 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 60.114948 loss_att 58.303940 loss_ctc 60.731007 loss_bias 36.096294 lr 0.00002004 rank 7 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 56.977654 loss_att 53.650196 loss_ctc 61.347378 loss_bias 33.943447 lr 0.00002004 rank 1 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 56.869381 loss_att 54.544899 loss_ctc 58.243603 loss_bias 40.495705 lr 0.00002004 rank 2 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 58.940693 loss_att 57.577057 loss_ctc 57.989662 loss_bias 41.328430 lr 0.00002004 rank 4 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 63.078079 loss_att 60.879333 loss_ctc 64.494652 loss_bias 37.138424 lr 0.00002004 rank 3 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 62.410076 loss_att 58.739368 loss_ctc 67.138695 loss_bias 38.363663 lr 0.00002004 rank 6 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 61.162239 loss_att 57.996552 loss_ctc 63.624905 loss_bias 49.239365 lr 0.00002004 rank 5 2023-10-17 18:55:16,909 DEBUG TRAIN Batch 0/500 loss 62.478779 loss_att 60.307823 loss_ctc 63.295692 loss_bias 42.486469 lr 0.00002004 rank 0 2023-10-17 18:55:57,183 DEBUG TRAIN Batch 0/600 loss 62.084000 loss_att 62.485199 loss_ctc 56.836884 loss_bias 43.109840 lr 0.00002404 rank 7 2023-10-17 18:55:57,186 DEBUG TRAIN Batch 0/600 loss 63.226624 loss_att 62.583645 loss_ctc 60.321804 loss_bias 44.050949 lr 0.00002404 rank 3

你是不是直接从头开始训练了,为了减少对原本asr性能的影响,我写的是从一个预训练好的asr模型开始训,除了热词模块之外的参数都给冻结了。从头开始训应该也能够收敛,但是至少得把冻结的参数先解冻。

没有,也是用的之前在librispeech上预训练好的asr模型,做了参数冻结

我试了下,用直接clone下来的代码+github上预训练的librispeech模型+我提供的yaml是可以正常收敛的,大概在1000个batch的时候loss就降到10了。会不会是你用的预训练asr模型和我提供的yaml里面某些参数对不上,导致模型随机初始化了一些参数并且还被冻结了。

我试过重头开始训练是可以收敛的。。。。 对于预训练模型,我对比了和你的训练参数是一模一样的,这就奇怪了。。。。你用的是哪一个预训练的librispeech呢? 我再检查下原因

wpupup avatar Oct 19 '23 02:10 wpupup

您好,我尝试复现您在librispeech的结果,但是在训练热词增强模型时,出现cv loss值不下降的情况(保持在160多),并且train loss也是下降到四五十就不太下降了。 另外,我发现每次训练几个batch时,都会花五六分钟去训练下一个batch,正常情况我的显卡每训练一个batch的时间是30s左右,下面是一小段训练日志。。。 我没修改任何代码,训练conf文件也是您提供那个train_bias, 能大概分析下出现问题的原因吗? 谢谢! 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 77.121582 loss_att 68.588936 loss_ctc 90.912209 loss_bias 61.188702 lr 0.00001204 rank 3 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 60.563221 loss_att 53.186646 loss_ctc 73.329880 loss_bias 44.453613 lr 0.00001204 rank 7 2023-10-17 18:48:14,596 DEBUG TRAIN Batch 0/300 loss 66.905380 loss_att 60.219139 loss_ctc 76.915077 loss_bias 55.915321 lr 0.00001204 rank 1 2023-10-17 18:48:14,599 DEBUG TRAIN Batch 0/300 loss 58.367058 loss_att 54.565548 loss_ctc 63.268948 loss_bias 39.683086 lr 0.00001204 rank 0 2023-10-17 18:48:54,507 DEBUG TRAIN Batch 0/400 loss 69.295921 loss_att 62.990799 loss_ctc 78.056396 loss_bias 59.514668 lr 0.00001604 rank 7 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 60.892227 loss_att 55.707409 loss_ctc 68.627617 loss_bias 43.625130 lr 0.00001604 rank 6 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 70.570961 loss_att 63.955940 loss_ctc 81.632156 loss_bias 43.738525 lr 0.00001604 rank 2 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 56.387531 loss_att 51.965221 loss_ctc 61.897652 loss_bias 48.085854 lr 0.00001604 rank 5 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 57.394482 loss_att 53.534023 loss_ctc 62.557728 loss_bias 38.444881 lr 0.00001604 rank 1 2023-10-17 18:48:54,512 DEBUG TRAIN Batch 0/400 loss 61.427593 loss_att 57.190033 loss_ctc 66.434952 loss_bias 48.802876 lr 0.00001604 rank 4 2023-10-17 18:48:54,513 DEBUG TRAIN Batch 0/400 loss 66.382660 loss_att 61.784157 loss_ctc 71.916908 loss_bias 51.955982 lr 0.00001604 rank 3 2023-10-17 18:48:54,517 DEBUG TRAIN Batch 0/400 loss 69.309433 loss_att 61.884018 loss_ctc 81.042137 loss_bias 55.932556 lr 0.00001604 rank 0 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 60.114948 loss_att 58.303940 loss_ctc 60.731007 loss_bias 36.096294 lr 0.00002004 rank 7 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 56.977654 loss_att 53.650196 loss_ctc 61.347378 loss_bias 33.943447 lr 0.00002004 rank 1 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 56.869381 loss_att 54.544899 loss_ctc 58.243603 loss_bias 40.495705 lr 0.00002004 rank 2 2023-10-17 18:55:16,906 DEBUG TRAIN Batch 0/500 loss 58.940693 loss_att 57.577057 loss_ctc 57.989662 loss_bias 41.328430 lr 0.00002004 rank 4 2023-10-17 18:55:16,907 DEBUG TRAIN Batch 0/500 loss 63.078079 loss_att 60.879333 loss_ctc 64.494652 loss_bias 37.138424 lr 0.00002004 rank 3 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 62.410076 loss_att 58.739368 loss_ctc 67.138695 loss_bias 38.363663 lr 0.00002004 rank 6 2023-10-17 18:55:16,908 DEBUG TRAIN Batch 0/500 loss 61.162239 loss_att 57.996552 loss_ctc 63.624905 loss_bias 49.239365 lr 0.00002004 rank 5 2023-10-17 18:55:16,909 DEBUG TRAIN Batch 0/500 loss 62.478779 loss_att 60.307823 loss_ctc 63.295692 loss_bias 42.486469 lr 0.00002004 rank 0 2023-10-17 18:55:57,183 DEBUG TRAIN Batch 0/600 loss 62.084000 loss_att 62.485199 loss_ctc 56.836884 loss_bias 43.109840 lr 0.00002404 rank 7 2023-10-17 18:55:57,186 DEBUG TRAIN Batch 0/600 loss 63.226624 loss_att 62.583645 loss_ctc 60.321804 loss_bias 44.050949 lr 0.00002404 rank 3

你是不是直接从头开始训练了,为了减少对原本asr性能的影响,我写的是从一个预训练好的asr模型开始训,除了热词模块之外的参数都给冻结了。从头开始训应该也能够收敛,但是至少得把冻结的参数先解冻。

没有,也是用的之前在librispeech上预训练好的asr模型,做了参数冻结

我试了下,用直接clone下来的代码+github上预训练的librispeech模型+我提供的yaml是可以正常收敛的,大概在1000个batch的时候loss就降到10了。会不会是你用的预训练asr模型和我提供的yaml里面某些参数对不上,导致模型随机初始化了一些参数并且还被冻结了。

我试过重头开始训练是可以收敛的。。。。 对于预训练模型,我对比了和你的训练参数是一模一样的,这就奇怪了。。。。你用的是哪一个预训练的librispeech呢? 我再检查下原因

我用的就是wenet在github上提供下载的这个librispeech模型 https://github.com/wenet-e2e/wenet/blob/main/docs/pretrained_models.en.md

kaixunhuang0 avatar Oct 19 '23 02:10 kaixunhuang0

您好,我看您描述到 “context list for the test set is sourced from: https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias” 但我还是不是很清楚您如何获得这3838或100个热词的。是从words/all_rare_words.txt随机挑选的吗还是通过别的什么方法呢?

NiniAndy avatar Oct 31 '23 13:10 NiniAndy

您好,我看您描述到 “context list for the test set is sourced from: https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias” 但我还是不是很清楚您如何获得这3838或100个热词的。是从words/all_rare_words.txt随机挑选的吗还是通过别的什么方法呢?

ref目录下有为每条数据构建的固定大小的热词列表,其中包括真实热词和干扰项,我用的就是其中test_other数据集的100.tsv(要测试这个的话需要修改下热词列表的读取方式,为每条数据单独读取列表),3838是通过合并所有test_other数据的真实热词得到的总列表

kaixunhuang0 avatar Oct 31 '23 13:10 kaixunhuang0

您好,我看您描述到 “context list for the test set is sourced from: https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias” 但我还是不是很清楚您如何获得这3838或100个热词的。是从words/all_rare_words.txt随机挑选的吗还是通过别的什么方法呢?

ref目录下有为每条数据构建的固定大小的热词列表,其中包括真实热词和干扰项,我用的就是其中test_other数据集的100.tsv(要测试这个的话需要修改下热词列表的读取方式,为每条数据单独读取列表),3838是通过合并所有test_other数据的真实热词得到的总列表

我看懂了,那我想请问是否有开源的aishell的热词表呢

NiniAndy avatar Oct 31 '23 13:10 NiniAndy

您好,我看您描述到 “context list for the test set is sourced from: https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias” 但我还是不是很清楚您如何获得这3838或100个热词的。是从words/all_rare_words.txt随机挑选的吗还是通过别的什么方法呢?

ref目录下有为每条数据构建的固定大小的热词列表,其中包括真实热词和干扰项,我用的就是其中test_other数据集的100.tsv(要测试这个的话需要修改下热词列表的读取方式,为每条数据单独读取列表),3838是通过合并所有test_other数据的真实热词得到的总列表

我看懂了,那我想请问是否有开源的aishell的热词表呢

modelscope上有一个aishell1的,https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_hotwords_testsets/summary

kaixunhuang0 avatar Nov 01 '23 02:11 kaixunhuang0

您好,我看您描述到 “context list for the test set is sourced from: https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias” 但我还是不是很清楚您如何获得这3838或100个热词的。是从words/all_rare_words.txt随机挑选的吗还是通过别的什么方法呢?

ref目录下有为每条数据构建的固定大小的热词列表,其中包括真实热词和干扰项,我用的就是其中test_other数据集的100.tsv(要测试这个的话需要修改下热词列表的读取方式,为每条数据单独读取列表),3838是通过合并所有test_other数据的真实热词得到的总列表

我看懂了,那我想请问是否有开源的aishell的热词表呢

modelscope上有一个aishell1的,https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_hotwords_testsets/summary

非常感谢

NiniAndy avatar Nov 01 '23 06:11 NiniAndy

@kaixunhuang0 你好凯勋,这是我在test_other 上的测试结果。

image

我有几个问题:

  1. baseline 8.77 是和你的一致的,但是使用你训练好的热词模型也是这个结果? 即 Wenet_Librispeech_deep_biasing/nn_bias/bias_model 和 librispeech_u2pp_conformer_exp 测的结果为啥没有区别?
  2. 没有区别可能是nn bias没有生效?那怎样测试才能生效呢?我试了runtime 和python的测试,都未生效。
  3. context_2.0_test_other 测的结果是8.13 。这里是使用的3838条词表,context score=2.0 测的,应该和你之前发的 new runtime context graph 是对的上的。
  4. 你有合适的计算 U-WER、B-WER的工具吗?我用的是is21_deep_bias/score.py ,但跟你的结果有点区别,我想和你的统一。

dahu1 avatar Nov 28 '23 06:11 dahu1

@kaixunhuang0 你好凯勋,这是我在test_other 上的测试结果。

image

我有几个问题:

  1. baseline 8.77 是和你的一致的,但是使用你训练好的热词模型也是这个结果? 即 Wenet_Librispeech_deep_biasing/nn_bias/bias_model 和 librispeech_u2pp_conformer_exp 测的结果为啥没有区别?
  2. 没有区别可能是nn bias没有生效?那怎样测试才能生效呢?我试了runtime 和python的测试,都未生效。
  3. context_2.0_test_other 测的结果是8.13 。这里是使用的3838条词表,context score=2.0 测的,应该和你之前发的 new runtime context graph 是对的上的。
  4. 你有合适的计算 U-WER、B-WER的工具吗?我用的是is21_deep_bias/score.py ,但跟你的结果有点区别,我想和你的统一。
  1. 如果你说的问题是在没开启热词的情况下都是8.77,这是因为我训练nn热词模型的时候,原本的asr模型是冻结的,所以用我那个训好的模型测没开启热词的情况会和baseline一模一样
  2. 如果是你尝试测试nn热词结果还是8.77,也许是你解码的某个参数传的不太对?你可以检查下context_bias_mode是否设置成了deep_biasing,如果要得到上面表里的结果,还需要把热词筛选context_filtering打开,并且设置context_filtering_threshold,我印象应该是-4
  3. 我这边计算bwer的代码是自个写的,肯定没他写的好,而且比较乱,如果你需要的话我今晚或者明晚整一下传到huggingface上边

kaixunhuang0 avatar Nov 28 '23 06:11 kaixunhuang0

  1. 如果你说的问题是在没开启热词的情况下都是 8.77,这是因为我训练 nn 热词模型的时候,原本的 asr 模型是冻结的,所以用我那个训好的模型测没开启热词的情况会和 baseline 一模一样
  2. 如果是你尝试测试 nn 热词结果还是 8.77,也许是你解码的某个参数传的不太对?你可以检查下 context_bias_mode 是否设置成了 deep_biasing,如果要得到上面表里的结果,还需要把热词筛选 context_filtering 打开,并且设置 context_filtering_threshold,我印象应该是 - 4
  3. 我这边计算 bwer 的代码是自个写的,肯定没他写的好,而且比较乱,如果你需要的话我今晚或者明晚整一下传到 huggingface 上边
  1. 好的,那我确实是没有改动参数,那应该是冻结了,我试一下 改改context_bias_mode 和 context_filtering 。
  2. 好的,那麻烦你传一下bwer了,主要是想跟你对齐。。

dahu1 avatar Nov 28 '23 06:11 dahu1

@kaixunhuang0 看到了,在调用wenet/bin/recognize.py 解码时,传入context_bias_mode 的相关参数,选择context_graph 还是 deep_biasing,两个都传的话,就是两个热词的方式都用,对吧? 还有一个问题是,runtime解码只集成了context_graph的方式,deep_biasing还没有集成吧?

dahu1 avatar Nov 28 '23 08:11 dahu1

@kaixunhuang0 看到了,在调用wenet/bin/recognize.py 解码时,传入context_bias_mode 的相关参数,选择context_graph 还是 deep_biasing,两个都传的话,就是两个热词的方式都用,对吧? 还有一个问题是,runtime解码只集成了context_graph的方式,deep_biasing还没有集成吧?

是的,两个都传就是两个同时使用,runtime还没有继承deep_biasing

kaixunhuang0 avatar Nov 28 '23 08:11 kaixunhuang0

@kaixunhuang0 我发现你还集成了nn bias 的两阶段过滤算法,很赞啊。

image

image

那我又有问题了,deep biasing带来的提升8.77 -> 7.93 ,two_stage_filtering 的贡献是多少? 哦,当然我也可以自己测,我跑了看看。 论文上看 缩减词表太猛了,特别适合大词表的热词方案。

dahu1 avatar Nov 28 '23 08:11 dahu1

@kaixunhuang0 我发现你还集成了nn bias 的两阶段过滤算法,很赞啊。

image

image 那我又有问题了,deep biasing带来的提升8.77 -> 7.93 ,two_stage_filtering 的贡献是多少? 哦,当然我也可以自己测,我跑了看看。 论文上看 缩减词表太猛了,特别适合大词表的热词方案。

对于nnbias来说,几千的热词列表就算很大的了,如果不带热词筛选的话可能从整体wer上看用nnbias可能就没什么提升了,不过bwer应该还是会有所改进。基本上热词列表到几百这个量级最好就要开热词筛选。

kaixunhuang0 avatar Nov 28 '23 09:11 kaixunhuang0

image

@kaixunhuang0 我是用 Wenet_Librispeech_deep_biasing/nn_bias/bias_model 模型 和 Wenet_Librispeech_deep_biasing/nn_bias/bias_model/test_other_context_list 词表进行解码对的,解码配置只设置了 --context_bias_mode "deep_biasing,context_graph" --context_list_path $context_path --context_filtering ,其余的--context_graph_score、--deep_biasing_score、--context_filtering_threshold 都用的默认值,我这个的趋势是跟你一样的,也是两个都开效果最好,但是结果比你的要差一点,你觉得会是什么原因呢?还有其他的东西要改吗?

dahu1 avatar Nov 29 '23 06:11 dahu1

image @kaixunhuang0 我是用 Wenet_Librispeech_deep_biasing/nn_bias/bias_model 模型 和 Wenet_Librispeech_deep_biasing/nn_bias/bias_model/test_other_context_list 词表进行解码对的,解码配置只设置了 --context_bias_mode "deep_biasing,context_graph" --context_list_path $context_path --context_filtering ,其余的--context_graph_score、--deep_biasing_score、--context_filtering_threshold 都用的默认值,我这个的趋势是跟你一样的,也是两个都开效果最好,但是结果比你的要差一点,你觉得会是什么原因呢?还有其他的东西要改吗?

试一下改成我表格里的graph score和biasing score,我测试的时候也是用的这份代码,这些输入都对齐的情况下应该结果是能一致的

kaixunhuang0 avatar Nov 29 '23 07:11 kaixunhuang0