NiniAndy comments

Results 4 comments of


                                            NiniAndy

[wenet] nn context biasing

您好，我看您描述到 “context list for the test set is sourced from: https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias” 但我还是不是很清楚您如何获得这3838或100个热词的。是从words/all_rare_words.txt随机挑选的吗还是通过别的什么方法呢？

[wenet] nn context biasing

> > 您好，我看您描述到 “context list for the test set is sourced from: [https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias”](https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias%E2%80%9D) 但我还是不是很清楚您如何获得这3838或100个热词的。是从words/all_rare_words.txt随机挑选的吗还是通过别的什么方法呢？ > > ref目录下有为每条数据构建的固定大小的热词列表，其中包括真实热词和干扰项，我用的就是其中test_other数据集的100.tsv（要测试这个的话需要修改下热词列表的读取方式，为每条数据单独读取列表），3838是通过合并所有test_other数据的真实热词得到的总列表我看懂了，那我想请问是否有开源的aishell的热词表呢

[wenet] nn context biasing

> > > > 您好，我看您描述到 “context list for the test set is sourced from: [https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias”](https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias%E2%80%9D) 但我还是不是很清楚您如何获得这3838或100个热词的。是从words/all_rare_words.txt随机挑选的吗还是通过别的什么方法呢？ > > > > > > > > > ref目录下有为每条数据构建的固定大小的热词列表，其中包括真实热词和干扰项，我用的就是其中test_other数据集的100.tsv（要测试这个的话需要修改下热词列表的读取方式，为每条数据单独读取列表），3838是通过合并所有test_other数据的真实热词得到的总列表 > > > >...

发现了在LLM-ASR推理中存在的一些小问题

问题找到了：在用whisper_qwen_linear.yaml训练时prompt是"Transcribe speech to text."详见： funasr/datasets/llm_datasets_qwenaudio/datasets.py 46-47 在inference时prompt默认是： prompt_pre = "USER: \nINSTRUCTION: {}\nINPUT: ".format(prompt) 还有一个bug：在funasr/models/llm_asr/model.py 301-303应该是写错了，应该改成： inputs_embeds = torch.cat((encoder_out, inputs_embeds[None, :, :]), dim=1) # [audio, prompt]