FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

paraformer微调之后模型变大,且和basemodel推理同一段wav文件时会报错

Open YouTwoMeToo opened this issue 1 year ago • 4 comments

在对paraformer长音频版模型进行微调之后,保存的pt文件大小由basemodel的800多M增加到了近2.6G, 且在推理同一段wav文件时,会报错,报错信息如下:

Traceback (most recent call last): File "/wind/aispace/train/source/src/FunASR/examples/industrial_data_pretraining/paraformer-zh-spk/tasks_bin.py", line 220, in results_left = asr_batch_infer(output_left_folder,paraformer_model) File "/wind/aispace/train/source/src/FunASR/examples/industrial_data_pretraining/paraformer-zh-spk/tasks_bin.py", line 124, in asr_batch_infer res = paraformer_model.generate(input=audio_binary,fs=8000) File "/wind/aispace/train/source/src/FunASR/funasr/auto/auto_model.py", line 300, in generate return self.inference(input, input_len=input_len, **cfg) File "/wind/aispace/train/source/src/FunASR/funasr/auto/auto_model.py", line 342, in inference res = model.inference(**batch, **kwargs) File "/wind/aispace/train/source/src/FunASR/funasr/models/bicif_paraformer/model.py", line 351, in inference postprocess_utils.sentence_postprocess(token, timestamp) File "/wind/aispace/train/source/src/FunASR/funasr/utils/postprocess_utils.py", line 235, in sentence_postprocess word_lists, ts_lists = abbr_dispose(word_lists, ts_lists) File "/wind/aispace/train/source/src/FunASR/funasr/utils/postprocess_utils.py", line 131, in abbr_dispose begin = time_stamp[ts_nums[num]][0] IndexError: list index out of range 0%|

funasr为最新版 请问这个问题是什么原因呢?会是与微调的数据有关系吗?

YouTwoMeToo avatar Nov 27 '24 09:11 YouTwoMeToo

您好,今天又测试了一下,在训练数据中加入了比较短的片段,效果有改善,但是依然会有少部分测试用例报了如上错误

YouTwoMeToo avatar Nov 28 '24 03:11 YouTwoMeToo

遇到了同样的问题,有什么解决方案吗?另外英文的识别能力也降低了,用样例asr_example_en.wav进行识别,原始的是输出He tried to think how it could be.。但是微调后输出为 Ie就宅IB.

chentiejin1 avatar Jan 08 '25 07:01 chentiejin1

font{
    line-height: 1.6;
}
ul,ol{
    padding-left: 20px;
    list-style-position: inside;
}



你好,模型变大是因为它同时保存了模型参数、梯度参数和优化器参数,可以把后两个键删掉,就是正常大小了,至于微调之后效果变差,目前我也不知道为什么

    
        
    


  

    
                
        
            
                
                        
                            
                        
                        
                            973546600
                        
                
                    
                        
                                ***@***.***
                        
                    
            
        
    
    





---- 回复的原邮件 ----



  
    
     发件人 
    
    
        ***@***.***>
        
    
  
  
    
     发送日期 
    
    
    2025年01月8日 15:20
    
  
  
    
     收件人 
    
    
     
      
        ***@***.***>
        
      
    
  
  
    
     抄送人 
    
    
      
        ***@***.***>
        ,
      
      
        ***@***.***>
        
      
    
  
  
    
     主题 
    
    
          Re: [modelscope/FunASR] paraformer微调之后模型变大,且和basemodel推理同一段wav文件时会报错 (Issue #2239)
    
  

遇到了同样的问题,有什么解决方案吗?另外英文的识别能力也降低了,用样例asr_example_en.wav进行识别,原始的是输出He tried to think how it could be.。但是微调后输出为 Ie就宅IB.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

YouTwoMeToo avatar Jan 08 '25 13:01 YouTwoMeToo

为它同时保存了模型参

这个是在哪里删掉这些参数的,是在config.yaml文件中删掉吗?

lukeewin avatar Mar 27 '25 20:03 lukeewin

同问,梯度参数和优化器参数这个在哪里删除的?

CryRobot avatar Jul 01 '25 06:07 CryRobot

同问,梯度参数和优化器参数这个在哪里删除的?

在pt文件里删

YouTwoMeToo avatar Jul 01 '25 06:07 YouTwoMeToo