SenseVoice 按操作文档finetune报错： styles = torch.LongTensor([[self.textnorm_int_dict[int(style)]] for style in text[:, 3]]).to(speech.device) IndexError: index 3 is out of bounds for dimension 1 with size 1

Notice: In order to resolve issues more efficiently, please raise issue following the template. （注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节）

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

Run cmd 'bash finetune.sh'
See error

Traceback (most recent call last):

[2024-11-01 20:33:48,933][root][INFO] - rank: 0, dataloader start from step: 0, batch_num: 5, after: 5 [2024-11-01 20:33:48,963][root][INFO] - rank: 0, dataloader start from step: 0, batch_num: 5, after: 5 [2024-11-01 20:33:48,989][root][ERROR] - ERROR: data is empty! [2024-11-01 20:33:51,222][root][ERROR] - ERROR: data is empty! Error executing job with overrides: ['++model=/mnt/home/sensevoice/SenseVoiceSmall', '++trust_remote_code=true', '++train_data_set_list=/mnt/home/sensevoice/train_data/datasets/asr_dataset.jsonl', '++valid_data_set_list=/mnt/home/sensevoice/train_data/datasets/asr_val.jsonl', '++dataset_conf.data_split_num=1', '++dataset_conf.batch_sampler=BatchSampler', '++dataset_conf.batch_size=10', '++dataset_conf.sort_size=1024', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=1', '++train_conf.max_epoch=50', '++train_conf.log_interval=1', '++train_conf.resume=true', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=2000', '++train_conf.keep_nbest_models=20', '++train_conf.avg_nbest_model=10', '++train_conf.use_deepspeed=false', '++train_conf.deepspeed_config=/mnt/home/sensevoice/SenseVoice-main/deepspeed_conf/ds_stage1.json', '++optim_conf.lr=0.0002', '++output_dir=./outputs'] Traceback (most recent call last): File "/mnt/home/sensevoice/FunASR-main/funasr/bin/train_ds.py", line 225, in main_hydra() File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main _run_hydra( File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra _run_app( File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app run_and_report( File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report raise ex File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report return func() File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in lambda: hydra.run( File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run _ = ret.return_value File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) File "/mnt/home/sensevoice/FunASR-main/funasr/bin/train_ds.py", line 56, in main_hydra main(**kwargs) File "/mnt/home/sensevoice/FunASR-main/funasr/bin/train_ds.py", line 173, in main trainer.train_epoch( File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/funasr/train_utils/trainer_ds.py", line 603, in train_epoch self.forward_step(model, batch, loss_dict=loss_dict) File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/funasr/train_utils/trainer_ds.py", line 670, in forward_step retval = model(**batch) File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "/mnt/home/sensevoice/SenseVoice-main/./model.py", line 680, in forward encoder_out, encoder_out_lens = self.encode(speech, speech_lengths, text) File "/mnt/home/sensevoice/SenseVoice-main/./model.py", line 733, in encode styles = torch.LongTensor([[self.textnorm_int_dict[int(style)]] for style in text[:, 3]]).to(speech.device) IndexError: index 3 is out of bounds for dimension 1 with size 1

数据集使用样例数据，sensevoice2jsonl转换后： {"key": "BAC009S0764W0121", "source": "/mnt/home/sensevoice/data_example/voice/BAC009S0764W0121.wav", "source_len": 420, "target": "甚至出现交易几乎停滞的情况", "target_len": 13, "with_or_wo_itn": "<|woitn|>", "text_language": "<|zh|>", "emo_target": "<|NEUTRAL|>", "event_target": "<|Speech|>"} {"key": "BAC009S0916W0489", "source": "/mnt/home/sensevoice/data_example/voice/BAC009S0916W0489.wav", "source_len": 573, "target": "湖北一公司以员工名义贷款数十员工负债千万", "target_len": 20, "with_or_wo_itn": "<|woitn|>", "text_language": "<|zh|>", "emo_target": "<|NEUTRAL|>", "event_target": "<|Speech|>"} {"key": "asr_example_cn_en", "source": "/mnt/home/sensevoice/data_example/voice/asr_example_cn_en.wav", "source_len": 1474, "target": "所有只要处理 data 不管你是做 machine learning 做 deep learning 做 data analytics 做 data science 也好 scientist 也好通通都要都做的基本功啊那 again 先先对有一些也许对", "target_len": 19, "with_or_wo_itn": "<|woitn|>", "text_language": "<|zh|>", "emo_target": "<|NEUTRAL|>", "event_target": "<|Speech|>"} {"key": "ID0012W0014", "source": "/mnt/home/sensevoice/data_example/voice/asr_example_en.wav", "source_len": 222, "target": "he tried to think how it could be", "target_len": 8, "with_or_wo_itn": "<|woitn|>", "text_language": "<|en|>", "emo_target": "<|EMO_UNKNOWN|>", "event_target": "<|Speech|>"}

Code sample

Expected behavior

Environment

OS (e.g., Linux):
FunASR Version (e.g., 1.0.0):
ModelScope Version (e.g., 1.11.0):
PyTorch Version (e.g., 2.0.0):
How you installed funasr (pip, source):
Python version:
GPU (e.g., V100M32)
CUDA/cuDNN version (e.g., cuda11.7):
Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
Any other relevant information:

Additional context

Nov 01 '24 13:11 eatoncys

我也遇到了这个问题。他的代码看起来要对text去做padding，但我暂时没找到这个padding的代码在哪里，好像得自己补充了。

Nov 18 '24 02:11 JonneryR

我的代码里我把他脚本的batchsampler关掉了就解决了，我当时的解决办法是我先用一条数据去看，然后发现他batch里对同一条数据采的不一致，然后batchsampler那条注释掉就能运行了

------------------ 原始邮件 ------------------ 发件人: JonneryR @.> 发送时间: 2024年11月18日 10:46 收件人: FunAudioLLM/SenseVoice @.> 抄送: qiuqiu-879 @.>, Comment @.> 主题: Re: [FunAudioLLM/SenseVoice] 按操作文档finetune报错： styles = torch.LongTensor([[self.textnorm_int_dict[int(style)]] for style in text[:, 3]]).to(speech.device) IndexError: index 3 is out of bounds for dimension 1 with size 1 (Issue #158)

我也遇到了这个问题。他的代码看起来要对text去做padding，但我暂时没找到这个padding的代码在哪里，好像得自己补充了。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Nov 18 '24 02:11 qiuqiu-879

问题解决了，是dataset的选择问题，需要选择SenseVoiceCTCDataset，只有这里才有给text前面做padding的代码。

Nov 18 '24 05:11 JonneryR

问题解决了，是dataset的选择问题，需要选择SenseVoiceCTCDataset，只有这里才有给text前面做padding的代码。

下面是我的参数，看样子我的就是SenseVoiceCTCDataset，但是还是报错。

[2024-11-28 19:33:47,484][root][INFO] - kwargs: {'encoder': 'SenseVoiceEncoderSmall', 'encoder_conf': {'output_size': 512, 'attention_heads': 4, 'linear_units': 2048, 'num_blocks': 50, 'tp_blocks': 20, 'dropout_rate': 0.1, 'positional_dropout_rate': 0.1, 'attention_dropout_rate': 0.1, 'input_layer': 'pe', 'pos_enc_class': 'SinusoidalPositionEncoder', 'normalize_before': True, 'kernel_size': 11, 'sanm_shfit': 0, 'selfattention_layer_type': 'sanm'}, 'model': 'SenseVoiceSmall', 'model_conf': {'length_normalized_loss': True, 'sos': 1, 'eos': 2, 'ignore_id': -1}, 'tokenizer': 'SentencepiecesTokenizer', 'tokenizer_conf': {'bpemodel': '/home/zhaogengs/.cache/modelscope/hub/iic/SenseVoiceSmall/chn_jpn_yue_eng_ko_spectok.bpe.model', 'unk_symbol': '<unk>', 'split_with_space': True}, 'frontend': 'WavFrontend', 'frontend_conf': {'fs': 16000, 'window': 'hamming', 'n_mels': 80, 'frame_length': 25, 'frame_shift': 10, 'lfr_m': 7, 'lfr_n': 6, 'cmvn_file': '/home/zhaogengs/.cache/modelscope/hub/iic/SenseVoiceSmall/am.mvn'}, 'dataset': 'SenseVoiceCTCDataset', 'dataset_conf': {'index_ds': 'IndexDSJsonl', 'batch_sampler': 'BatchSampler', 'data_split_num': 1, 'batch_type': 'token', 'batch_size': 100, 'max_token_length': 2000, 'min_token_length': 60, 'max_source_length': 2000, 'min_source_length': 60, 'max_target_length': 200, 'min_target_length': 0, 'shuffle': True, 'num_workers': 4, 'sos': 1, 'eos': 2, 'IndexDSJsonl': 'IndexDSJsonl', 'retry': 20, 'sort_size': 1024}, 'train_conf': {'accum_grad': 1, 'grad_clip': 5, 'max_epoch': 50, 'keep_nbest_models': 20, 'avg_nbest_model': 10, 'log_interval': 1, 'resume': True, 'validate_interval': 2000, 'save_checkpoint_interval': 2000, 'use_deepspeed': False, 'deepspeed_config': '/home/zhaogengs/workspace/SenseVoice/deepspeed_conf/ds_stage1.json'}, 'optim': 'adamw', 'optim_conf': {'lr': 0.0002}, 'scheduler': 'warmuplr', 'scheduler_conf': {'warmup_steps': 25000}, 'specaug': 'SpecAugLFR', 'specaug_conf': {'apply_time_warp': False, 'time_warp_window': 5, 'time_warp_mode': 'bicubic', 'apply_freq_mask': True, 'freq_mask_width_range': [0, 30], 'lfr_rate': 6, 'num_freq_mask': 1, 'apply_time_mask': True, 'time_mask_width_range': [0, 12], 'num_time_mask': 1}, 'init_param': '/home/zhaogengs/.cache/modelscope/hub/iic/SenseVoiceSmall/model.pt', 'config': '/home/zhaogengs/.cache/modelscope/hub/iic/SenseVoiceSmall/config.yaml', 'is_training': True, 'trust_remote_code': True, 'train_data_set_list': '/home/zhaogengs/workspace/SenseVoice/train_data/output/train.jsonl', 'valid_data_set_list': '/home/zhaogengs/workspace/SenseVoice/train_data/output/val.jsonl', 'output_dir': './outputs', 'model_path': '/home/zhaogengs/.cache/modelscope/hub/iic/SenseVoiceSmall', 'device': 'cpu'}
[2024-11-28 19:33:47,484][root][INFO] - config.yaml is saved to: ./outputs/config.yaml

错误信息如下


[2024-11-28 19:33:47,801][root][INFO] - rank: 0, dataloader start from step: 0, batch_num: 2, after: 2
[2024-11-28 19:33:47,858][root][INFO] - rank: 0, dataloader start from step: 0, batch_num: 2, after: 2
[2024-11-28 19:33:50,194][root][ERROR] - ERROR: data is empty!
[2024-11-28 19:33:50,328][root][ERROR] - ERROR: data is empty!
Error executing job with overrides: ['++model=iic/SenseVoiceSmall', '++trust_remote_code=true', '++train_data_set_list=/home/zhaogengs/workspace/SenseVoice/train_data/output/train.jsonl', '++valid_data_set_list=/home/zhaogengs/workspace/SenseVoice/train_data/output/val.jsonl', '++dataset_conf.data_split_num=1', '++dataset_conf.batch_sampler=BatchSampler', '++dataset_conf.batch_size=100', '++dataset_conf.sort_size=1024', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=4', '++train_conf.max_epoch=50', '++train_conf.log_interval=1', '++train_conf.resume=true', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=2000', '++train_conf.keep_nbest_models=20', '++train_conf.avg_nbest_model=10', '++train_conf.use_deepspeed=false', '++train_conf.deepspeed_config=/home/zhaogengs/workspace/SenseVoice/deepspeed_conf/ds_stage1.json', '++optim_conf.lr=0.0002', '++output_dir=./outputs']
Traceback (most recent call last):
  File "/home/zhaogengs/workspace/SenseVoice/FunASR/funasr/bin/train_ds.py", line 225, in <module>
    main_hydra()
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
           ^^^^^^
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
            ^^^^^^^^^^
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
        ^^^^^^^^^^^^^^^^
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhaogengs/workspace/SenseVoice/FunASR/funasr/bin/train_ds.py", line 56, in main_hydra
    main(**kwargs)
  File "/home/zhaogengs/workspace/SenseVoice/FunASR/funasr/bin/train_ds.py", line 173, in main
    trainer.train_epoch(
  File "/home/zhaogengs/workspace/SenseVoice/FunASR/funasr/train_utils/trainer_ds.py", line 603, in train_epoch
    self.forward_step(model, batch, loss_dict=loss_dict)
  File "/home/zhaogengs/workspace/SenseVoice/FunASR/funasr/train_utils/trainer_ds.py", line 670, in forward_step
    retval = model(**batch)
             ^^^^^^^^^^^^^^
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhaogengs/workspace/SenseVoice/model.py", line 680, in forward
    encoder_out, encoder_out_lens = self.encode(speech, speech_lengths, text)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhaogengs/workspace/SenseVoice/model.py", line 733, in encode
    styles = torch.LongTensor([[self.textnorm_int_dict[int(style)]] for style in text[:, 3]]).to(speech.device)
                                                                                 ~~~~^^^^^^
IndexError: index 3 is out of bounds for dimension 1 with size 1
E1128 19:33:55.141000 140491117365056 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 2095732) of binary: /home/zhaogengs/miniconda3/envs/SenseVoice/bin/python
Traceback (most recent call last):
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/torch/distributed/run.py", line 879, in main
    run(args)
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/torch/distributed/run.py", line 870, in run
    elastic_launch(
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhaogengs/miniconda3/envs/SenseVoice/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/home/zhaogengs/workspace/SenseVoice/FunASR/funasr/bin/train_ds.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-11-28_19:33:55
  host      : 172-16-158-67-Debian-22
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 2095732)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Nov 28 '24 12:11 sadlay

问题解决了，是dataset的选择问题，需要选择SenseVoiceCTCDataset，只有这里才有给text前面做padding的代码。

请问具体是怎么解决的？

Dec 02 '24 07:12 sadlay

我的代码里我把他脚本的batchsampler关掉了就解决了，我当时的解决办法是我先用一条数据去看，然后发现他batch里对同一条数据采的不一致，然后batchsampler那条注释掉就能运行了 …

采用这个方法解决了！可以正常训练了

Feb 19 '25 16:02 Reddxxxxxx

我的代码里我用脚本的batchsampler关掉了就解决了，我当时的解决办法是我先用一条数据去，然后发现他batch里对同一条数据采收的一致性，然后batchsampler那条注释掉就可以运行了 ……

采用这个方法解决了！可以正常训练了

我这里有点奇怪，我删除了batchsampler那一行还会显示一样的报错，但是只注释掉batchsampler不删除就能运行少量数据，数据集一大就会cuda out of memory

Mar 17 '25 08:03 jollyfish-cjy

我的代码里我用脚本的batchsampler关掉了就解决了，我当时的解决办法是我先用一条数据去，然后发现他batch里对同一条数据采收的一致性，然后batchsampler那条注释掉就可以运行了 ……

采用这个方法解决了！可以正常训练了

我这里有点奇怪，我删除了batchsampler那一行还会显示一样的报错，但是只注释掉batchsampler不删除就能运行少量数据，数据集一大就会cuda out of memory

用魔搭官方镜像的环境试一下，https://modelscope.cn/docs/intro/environment-setup

Sep 15 '25 02:09 slin000111

注释掉batchsampler后，没有报错了。发现后面的参数都被变为默认参数了。默认参数里面，batch_size=14000。因此，尝试将batch_size设置大一点，终于通了。

Oct 10 '25 02:10 falshchen