SenseVoice Timestamp error may caused by audio file.

Notice: In order to resolve issues more efficiently, please raise issue following the template. （注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节）

🐛 Bug

Run timestamp error

To Reproduce

funasr version: 1.2.0. Downloading Model to directory: C:\Users\huowuge.cache\modelscope\hub\iic/SenseVoiceSmall 2024-12-21 22:49:58,457 - modelscope - WARNING - Using branch: master as version is unstable, use with caution Downloading Model to directory: C:\Users\huowuge.cache\modelscope\hub\iic/speech_fsmn_vad_zh-cn-16k-common-pytorch 2024-12-21 22:50:01,588 - modelscope - WARNING - Using branch: master as version is unstable, use with caution Downloading Model to directory: C:\Users\huowuge.cache\modelscope\hub\iic/punc_ct-transformer_cn-en-common-vocab471067-large 2024-12-21 22:50:02,127 - modelscope - WARNING - Using branch: master as version is unstable, use with caution Building prefix dict from the default dictionary ... DEBUG:jieba:Building prefix dict from the default dictionary ... Loading model from cache C:\Users\huowuge\AppData\Local\Temp\jieba.cache DEBUG:jieba:Loading model from cache C:\Users\huowuge\AppData\Local\Temp\jieba.cache Loading model cost 0.591 seconds. DEBUG:jieba:Loading model cost 0.591 seconds. Prefix dict has been built successfully. DEBUG:jieba:Prefix dict has been built successfully. rtf_avg: 6.220: 100%|██████████| 1/1 [00:12<00:00, 12.07s/it] 0%| | 0/1 [00:00<?, ?it/s] 0%| | 0/9 [00:00<?, ?it/s]C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [1,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [2,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [3,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [4,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [5,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [6,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [7,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [8,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. Traceback (most recent call last): File "D:\projects\funasr\asr_en\asr.py", line 19, in res = model.generate( ^^^^^^^^^^^^^^^ File "D:\projects\funasr.venv\Lib\site-packages\funasr\auto\auto_model.py", line 304, in generate return self.inference_with_vad(input, input_len=input_len, **cfg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\projects\funasr.venv\Lib\site-packages\funasr\auto\auto_model.py", line 458, in inference_with_vad results = self.inference( ^^^^^^^^^^^^^^^ File "D:\projects\funasr.venv\Lib\site-packages\funasr\auto\auto_model.py", line 343, in inference res = model.inference(**batch, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\projects\funasr.venv\Lib\site-packages\funasr\models\sense_voice\model.py", line 932, in inference pred = groupby(align[0, : encoder_out_lens[0]]) ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

0%| | 0/9 [00:02<?, ?it/s] 0%| | 0/1 [00:03<?, ?it/s]

Code sample

Expected behavior

Environment

OS (e.g., Linux): windows11
FunASR Version (e.g., 1.0.0):1.20
ModelScope Version (e.g., 1.11.0): latest
PyTorch Version (e.g., 2.0.0): 2.31
How you installed funasr (pip, source): pip
Python version:3.11
GPU (e.g., V100M32): 1080Ti
CUDA/cuDNN version (e.g., cuda11.7): 11.8+gpu
Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
Any other relevant information:

Additional context

The error may caused by the audio, since I split the input audio into short parts, the error still accurs when processing the first audio, the following audios can proceed successfully. I updated the input audio to DingDing Group in 2024/12/13, the file name is Into+the+Uncut+Grass+-+Trevor+Noah.mp3

Dec 21 '24 15:12 Huowuge

funasr最新main分支安装，环境如下：

torch                             2.4.0
funasr                            1.2.2                /mnt/workspace/FunASR

cuda版本：

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

Into+the+Uncut+Grass+-+Trevor+Noah.mp3，音频格式：

Input File     : 'Into+the+Uncut+Grass+-+Trevor+Noah.mp3'
Channels       : 2
Sample Rate    : 44100
Precision      : 16-bit
Duration       : 00:39:02.62 = 103309674 samples = 175697 CDDA sectors
File Size      : 18.8M
Bit Rate       : 64.1k
Sample Encoding: MPEG audio (layer I, II or III)
Comments       : 
Title=Into the Uncut Grass  
Artist=Trevor Noah
Album=Into the Uncut Grass
Tracknumber=01/01
Year=2024
Genre=Children's Audiobooks

without time_stamp，推理正常，测试代码如下：

from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "iic/SenseVoiceSmall"


model = AutoModel(
    model=model_dir,
    vad_model="fsmn-vad",
    vad_kwargs={"max_single_segment_time": 30000},
    device="cuda:0",
)

# en
res = model.generate(
    input=f"/mnt/workspace/Into+the+Uncut+Grass+-+Trevor+Noah.mp3",
    cache={},
    language="auto",  # "zh", "en", "yue", "ja", "ko", "nospeech"
    use_itn=True,
    batch_size_s=60,
    merge_vad=True,  #
    merge_length_s=15,
)
text = rich_transcription_postprocess(res[0]["text"])
print(text)

with time_stamp，报错如下：

rtf_avg: 4.414: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00,  8.56s/it]
  0%|                                                                                                                                                                           | 0/1 [00:00<?, ?it/s../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [1,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.  | 0/9 [00:00<?, ?it/s]
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [2,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [3,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [4,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [5,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
  File "/mnt/workspace/asr_sensevoice.py", line 29, in <module>
    res = model.generate(
  File "/mnt/workspace/FunASR/funasr/auto/auto_model.py", line 304, in generate
    return self.inference_with_vad(input, input_len=input_len, **cfg)
  File "/mnt/workspace/FunASR/funasr/auto/auto_model.py", line 458, in inference_with_vad
    results = self.inference(
  File "/mnt/workspace/FunASR/funasr/auto/auto_model.py", line 343, in inference
    res = model.inference(**batch, **kwargs)
  File "/mnt/workspace/FunASR/funasr/models/sense_voice/model.py", line 932, in inference
    pred = groupby(align[0, : encoder_out_lens[0]])
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

  0%|                                                                                                                                                                           | 0/9 [00:00<?, ?it/s]
  0%|                                                                                                                                                                           | 0/1 [00:03<?, ?it/s]

测试代码如下：

from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "iic/SenseVoiceSmall"


model = AutoModel(
    model=model_dir,
    vad_model="fsmn-vad",
    vad_kwargs={"max_single_segment_time": 30000},
    device="cuda:0",
)

# en with timestamp
res = model.generate(
    input=f"/mnt/workspace/Into+the+Uncut+Grass+-+Trevor+Noah.mp3",
    cache={},
    language="auto",  # "zh", "en", "yue", "ja", "ko", "nospeech"
    use_itn=True,
    batch_size_s=60,
    merge_vad=True,  #
    merge_length_s=15,
    output_timestamp=True,
)
print(res)
text = rich_transcription_postprocess(res[0]["text"])
print(text)

Dec 26 '24 03:12 slin000111

感谢反馈我来修一下

Dec 26 '24 06:12 R1ckShi

您好，请问修好了吗？或者有什么work around吗

Jan 02 '25 08:01 screw-44

@R1ckShi

res = model.generate(
   input=f"D:\AI\SenseVoice\example\第88章.mp3",
   cache={},
   language="auto",  # "zh", "en", "yue", "ja", "ko", "nospeech"
   use_itn=True,
   batch_size_s=60,
   merge_vad=False,  #
   merge_length_s=15,
   output_timestamp=True
)

只要 output_timestamp=True 就会报错，请问修复了吗？

Jan 03 '25 16:01 szytwo

Same issue. It seems to only trigger stably when the input audio is long (in my case, when length >= 5 minutes).

Jan 05 '25 13:01 Isuxiz

请问这个问题修复了吗

Jan 20 '25 00:01 smengfei

期待修复，如果使用sensevoicesmall.from_pretrained()的方式加载就可以正常打印时间戳

Jan 22 '25 10:01 yqw1996

也遇到了同样的问题, 只要同时开output_timestamp 和 vad 就会出现这个错误

Feb 22 '25 16:02 buliaoyin

也遇到了同样的问题,期待早日修复 File "E:\workspace\funasr\asr_en\asr.py", line 19, in res = model.generate( ^^^^^^^^^^^^^^^ File "E:\workspace\funasr.venv\Lib\site-packages\funasr\auto\auto_model.py", line 304, in generate return self.inference_with_vad(input, input_len=input_len, **cfg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\workspace\funasr.venv\Lib\site-packages\funasr\auto\auto_model.py", line 458, in inference_with_vad results = self.inference( ^^^^^^^^^^^^^^^ File E:\workspace\funasr.venv\Lib\site-packages\funasr\auto\auto_model.py", line 343, in inference res = model.inference(**batch, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\workspace\funasr.venv\Lib\site-packages\funasr\models\sense_voice\model.py", line 932, in inference pred = groupby(align[0, : encoder_out_lens[0]])

RuntimeError: CUDA error: device-side assert triggered

Feb 27 '25 00:02 chengxl2016

@chengxl2016 @buliaoyin 遇到相同的问题，在大神指导下解决了供参考

临时解决方案直接修改 <project_path>/.venv/lib/python3.8/site-packages/funasr/models/sense_voice/model.py 该文件即可！

Mar 07 '25 13:03 hohaiuhsx