Timestamp error may caused by audio file.
Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
🐛 Bug
Run timestamp error
To Reproduce
funasr version: 1.2.0.
Downloading Model to directory: C:\Users\huowuge.cache\modelscope\hub\iic/SenseVoiceSmall
2024-12-21 22:49:58,457 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
Downloading Model to directory: C:\Users\huowuge.cache\modelscope\hub\iic/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-12-21 22:50:01,588 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
Downloading Model to directory: C:\Users\huowuge.cache\modelscope\hub\iic/punc_ct-transformer_cn-en-common-vocab471067-large
2024-12-21 22:50:02,127 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
Building prefix dict from the default dictionary ...
DEBUG:jieba:Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\huowuge\AppData\Local\Temp\jieba.cache
DEBUG:jieba:Loading model from cache C:\Users\huowuge\AppData\Local\Temp\jieba.cache
Loading model cost 0.591 seconds.
DEBUG:jieba:Loading model cost 0.591 seconds.
Prefix dict has been built successfully.
DEBUG:jieba:Prefix dict has been built successfully.
rtf_avg: 6.220: 100%|██████████| 1/1 [00:12<00:00, 12.07s/it]
0%| | 0/1 [00:00<?, ?it/s]
0%| | 0/9 [00:00<?, ?it/s]C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [1,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [2,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [3,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [4,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [5,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [6,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [7,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [0,0,0], thread: [8,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
Traceback (most recent call last):
File "D:\projects\funasr\asr_en\asr.py", line 19, in TORCH_USE_CUDA_DSA to enable device-side assertions.
0%| | 0/9 [00:02<?, ?it/s] 0%| | 0/1 [00:03<?, ?it/s]
Code sample
Expected behavior
Environment
- OS (e.g., Linux): windows11
- FunASR Version (e.g., 1.0.0):1.20
- ModelScope Version (e.g., 1.11.0): latest
- PyTorch Version (e.g., 2.0.0): 2.31
- How you installed funasr (
pip, source): pip - Python version:3.11
- GPU (e.g., V100M32): 1080Ti
- CUDA/cuDNN version (e.g., cuda11.7): 11.8+gpu
- Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
- Any other relevant information:
Additional context
The error may caused by the audio, since I split the input audio into short parts, the error still accurs when processing the first audio, the following audios can proceed successfully. I updated the input audio to DingDing Group in 2024/12/13, the file name is Into+the+Uncut+Grass+-+Trevor+Noah.mp3
funasr最新main分支安装,环境如下:
torch 2.4.0
funasr 1.2.2 /mnt/workspace/FunASR
cuda版本:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
Into+the+Uncut+Grass+-+Trevor+Noah.mp3,音频格式:
Input File : 'Into+the+Uncut+Grass+-+Trevor+Noah.mp3'
Channels : 2
Sample Rate : 44100
Precision : 16-bit
Duration : 00:39:02.62 = 103309674 samples = 175697 CDDA sectors
File Size : 18.8M
Bit Rate : 64.1k
Sample Encoding: MPEG audio (layer I, II or III)
Comments :
Title=Into the Uncut Grass
Artist=Trevor Noah
Album=Into the Uncut Grass
Tracknumber=01/01
Year=2024
Genre=Children's Audiobooks
without time_stamp,推理正常,测试代码如下:
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess
model_dir = "iic/SenseVoiceSmall"
model = AutoModel(
model=model_dir,
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 30000},
device="cuda:0",
)
# en
res = model.generate(
input=f"/mnt/workspace/Into+the+Uncut+Grass+-+Trevor+Noah.mp3",
cache={},
language="auto", # "zh", "en", "yue", "ja", "ko", "nospeech"
use_itn=True,
batch_size_s=60,
merge_vad=True, #
merge_length_s=15,
)
text = rich_transcription_postprocess(res[0]["text"])
print(text)
with time_stamp,报错如下:
rtf_avg: 4.414: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00, 8.56s/it]
0%| | 0/1 [00:00<?, ?it/s../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [1,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. | 0/9 [00:00<?, ?it/s]
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [2,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [3,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [4,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [5,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
File "/mnt/workspace/asr_sensevoice.py", line 29, in <module>
res = model.generate(
File "/mnt/workspace/FunASR/funasr/auto/auto_model.py", line 304, in generate
return self.inference_with_vad(input, input_len=input_len, **cfg)
File "/mnt/workspace/FunASR/funasr/auto/auto_model.py", line 458, in inference_with_vad
results = self.inference(
File "/mnt/workspace/FunASR/funasr/auto/auto_model.py", line 343, in inference
res = model.inference(**batch, **kwargs)
File "/mnt/workspace/FunASR/funasr/models/sense_voice/model.py", line 932, in inference
pred = groupby(align[0, : encoder_out_lens[0]])
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
0%| | 0/9 [00:00<?, ?it/s]
0%| | 0/1 [00:03<?, ?it/s]
测试代码如下:
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess
model_dir = "iic/SenseVoiceSmall"
model = AutoModel(
model=model_dir,
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 30000},
device="cuda:0",
)
# en with timestamp
res = model.generate(
input=f"/mnt/workspace/Into+the+Uncut+Grass+-+Trevor+Noah.mp3",
cache={},
language="auto", # "zh", "en", "yue", "ja", "ko", "nospeech"
use_itn=True,
batch_size_s=60,
merge_vad=True, #
merge_length_s=15,
output_timestamp=True,
)
print(res)
text = rich_transcription_postprocess(res[0]["text"])
print(text)
感谢反馈 我来修一下
您好,请问修好了吗?或者有什么work around吗
@R1ckShi
res = model.generate(
input=f"D:\AI\SenseVoice\example\第88章.mp3",
cache={},
language="auto", # "zh", "en", "yue", "ja", "ko", "nospeech"
use_itn=True,
batch_size_s=60,
merge_vad=False, #
merge_length_s=15,
output_timestamp=True
)
只要 output_timestamp=True 就会报错,请问修复了吗?
Same issue. It seems to only trigger stably when the input audio is long (in my case, when length >= 5 minutes).
请问这个问题修复了吗
期待修复,如果使用sensevoicesmall.from_pretrained()的方式加载就可以正常打印时间戳
也遇到了同样的问题, 只要同时开output_timestamp 和 vad 就会出现这个错误
也遇到了同样的问题,期待早日修复 File "E:\workspace\funasr\asr_en\asr.py", line 19, in res = model.generate( ^^^^^^^^^^^^^^^ File "E:\workspace\funasr.venv\Lib\site-packages\funasr\auto\auto_model.py", line 304, in generate return self.inference_with_vad(input, input_len=input_len, **cfg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\workspace\funasr.venv\Lib\site-packages\funasr\auto\auto_model.py", line 458, in inference_with_vad results = self.inference( ^^^^^^^^^^^^^^^ File E:\workspace\funasr.venv\Lib\site-packages\funasr\auto\auto_model.py", line 343, in inference res = model.inference(**batch, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\workspace\funasr.venv\Lib\site-packages\funasr\models\sense_voice\model.py", line 932, in inference pred = groupby(align[0, : encoder_out_lens[0]])
RuntimeError: CUDA error: device-side assert triggered
@chengxl2016 @buliaoyin 遇到相同的问题,在大神指导下解决了 供参考
临时解决方案 直接修改 <project_path>/.venv/lib/python3.8/site-packages/funasr/models/sense_voice/model.py 该文件即可!