FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

paraformer-large-vocab8404,给定seg_dict中,英文为bpe序列,对于不在seg_dict中的英文单词如何生成子词序列,有没有bpe.model?

Open yafuilee opened this issue 11 months ago • 1 comments

paraformer-large-vocab8404,给定seg_dict中英文为bpe序列,对于不在seg_dict中的英文单词如何生成子词序列,有没有bpe.model?

seg_dict部分:

<unk>	<unk>
.	. 
@	@ 
aaaaa	a@@ a@@ a@@ a@@ a
aaanthor	a@@ a@@ an@@ th@@ or
aabar	a@@ ab@@ ar
aace	a@@ ace
aachen	a@@ ach@@ en
aad	a@@ ad
aaden	a@@ ad@@ en
aadmi	a@@ ad@@ m@@ i
aaec's	a@@ a@@ e@@ c@@ 's
aaes	a@@ a@@ es
aaf	a@@ a@@ f
aafa	a@@ af@@ a
aafes	a@@ af@@ es
aafia	a@@ a@@ fi@@ a

yafuilee avatar Feb 27 '25 12:02 yafuilee

相同的疑问,fine-tune时对于不在seg_dict中的单词该如何处理?请问有什么可行的方法自己修改seg_dict吗?

ChrisLauVI avatar Apr 10 '25 13:04 ChrisLauVI

跪求bpe.model

WilliamZhangWD avatar Aug 01 '25 09:08 WilliamZhangWD

不好意思,我没能要到…发自我的 iPhone在 2025年8月1日,17:48,WilliamZhangWD @.***> 写道:WilliamZhangWD left a comment (modelscope/FunASR#2400) 跪求bpe.model

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

yafuilee avatar Aug 03 '25 10:08 yafuilee

from funasr_onnx import SenseVoiceSmall model = SenseVoiceSmall(model_dir, batch_size=1, quantize=False)

我直接跑onnx的推理,也是提示我没有bpe模型...

chn_jpn_yue_eng_ko_spectok.bpe.model": No such file or directory Error

QuarTerll avatar Dec 08 '25 10:12 QuarTerll