FunASR CUDA ERROR

Environment

OS (e.g., Linux): Ubuntu Server 22.04
FunASR Version (e.g., 1.0.0): 1.1.12
ModelScope Version (e.g., 1.11.0): 1.19.2
PyTorch Version (e.g., 2.0.0): 2.5
How you installed funasr (pip, source): pip install funasr
Python version: 3.9
GPU (e.g., V100M32) 2080Ti
CUDA/cuDNN version (e.g., cuda11.7): cuda 12.8
Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
Any other relevant information:

Additional context

在ubuntu中运行funasr进行模型推理，报下面错误。

CUDA error: an illegal memory access was encountered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n')
Traceback (most recent call last):
  File "/root/miniconda3/envs/asr/lib/python3.9/site-packages/celery/app/trace.py", line 453, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/root/miniconda3/envs/asr/lib/python3.9/site-packages/celery/app/trace.py", line 736, in __protected_call__
    return self.run(*args, **kwargs)
  File "/usr/local/src/new_asr_spk_api/top/lukeewin/asr/service/asr_service.py", line 56, in asr_task
    raise self.retry(exc=e, countdown=5, max_retries=2)
  File "/root/miniconda3/envs/asr/lib/python3.9/site-packages/celery/app/task.py", line 743, in retry
    raise_with_context(exc)
  File "/usr/local/src/new_asr_spk_api/top/lukeewin/asr/service/asr_service.py", line 32, in asr_task
    result = asr.model.generate(audio_path, is_final=True, batch_size_s=300)
  File "/root/miniconda3/envs/asr/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 303, in generate
    return self.inference_with_vad(input, input_len=input_len, **cfg)
  File "/root/miniconda3/envs/asr/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 376, in inference_with_vad
    res = self.inference(
  File "/root/miniconda3/envs/asr/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 342, in inference
    res = model.inference(**batch, **kwargs)
  File "/root/miniconda3/envs/asr/lib/python3.9/site-packages/funasr/models/fsmn_vad_streaming/model.py", line 712, in inference
    speech = speech.to(device=kwargs["device"])
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

请问有人遇到过吗？如何解决？看报错好像出自vad模型推理。

Oct 24 '25 18:10 lukeewin

gpu id设置错误

Oct 26 '25 14:10 LauraGPT

gpu id设置错误

有多张显卡，我是通过CUDA_VISIBLE_DEVICES方式指定使用哪张显卡，是这样设置使用gpu id错误吗？比如使用索引为1的显卡，我是在执行python代码前加入CUDA_VISIBLE_DEVICES=1，然后偶发会出现上面的cuda kernel error报错，当转写请求的音频短并且数量少的时候触发的概率小，当转写的音频时长长，比如5分钟以上，并且要转写的音频文件数量多的时候会触发cuda kernel error。

Oct 27 '25 03:10 lukeewin

This "CUDA error: an illegal memory access was encountered" typically indicates GPU memory corruption or invalid pointer access. Based on your environment (V100M32, CUDA 12.8, PyTorch 2.5), here are diagnostic steps:

Immediate Troubleshooting:

Verify GPU health: Run nvidia-smi to check for GPU errors or ECC errors
Check CUDA memory: Use torch.cuda.memory_summary() before the crash to identify memory leaks
Reduce batch size: The V100 32GB should handle most workloads, but try reducing batch size to isolate if it's OOM-related
Test with different precision: Try FP16 or mixed precision to reduce memory footprint

Common Causes for FunASR:

Audio tensor size mismatch: Ensure input audio tensors are correctly shaped and on the right device
Model checkpoint corruption: Re-download the model weights
CUDA/PyTorch version mismatch: PyTorch 2.5 with CUDA 12.8 should work, but verify compatibility
Docker memory constraints: Check if Docker has GPU memory limits that conflict with CUDA allocation

Debug Command:

CUDA_LAUNCH_BLOCKING=1 python your_script.py

This will provide the exact line where the illegal memory access occurs.

If the error persists, share the output of torch.cuda.get_device_properties(0) and the specific FunASR model you're using.

Nov 15 '25 17:11 shanto12