FunASR
FunASR copied to clipboard
基于SenseVoiceSmall训练方言时报错误KeyError: 6491
环境为:Ubuntu server 22.04 python: 3.11 cuda: 11.8 执行训练报下面错误:
[2025-05-01 18:32:21,169][root][INFO] - Validate epoch: 1, rank: 0
[2025-05-01 18:32:21,172][root][INFO] - rank: 0, dataloader start from step: 0, batch_num: 1, after: 1
[2025-05-01 18:32:21,291][root][INFO] - rank: 0, dataloader start from step: 0, batch_num: 1, after: 1
Error executing job with overrides: ['++model=iic/SenseVoiceSmall', '++train_data_set_list=/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/data/train_test.jsonl', '++valid_data_set_list=/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/data/val_test.jsonl', '++dataset_conf.data_split_num=1', '++dataset_conf.batch_sampler=BatchSampler', '++dataset_conf.batch_size=6000', '++dataset_conf.sort_size=1024', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=8', '++train_conf.max_epoch=150', '++train_conf.log_interval=1', '++train_conf.resume=true', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=2000', '++train_conf.keep_nbest_models=10', '++train_conf.avg_nbest_model=10', '++train_conf.use_deepspeed=false', '++train_conf.deepspeed_config=/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/../../ds_stage1.json', '++optim_conf.lr=0.0002', '++output_dir=./outputs']
Traceback (most recent call last):
File "/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/../../../funasr/bin/train_ds.py", line 228, in <module>
main_hydra()
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
^^^^^^
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
lambda: hydra.run(
^^^^^^^^^^
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/../../../funasr/bin/train_ds.py", line 56, in main_hydra
main(**kwargs)
File "/usr/local/src/FunASR/examples/industrial_data_pretraining/sense_voice/../../../funasr/bin/train_ds.py", line 201, in main
trainer.validate_epoch(model=model, dataloader_val=dataloader_val, epoch=epoch + 1)
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/train_utils/trainer_ds.py", line 771, in validate_epoch
self.forward_step(model, batch, loss_dict=loss_dict)
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/train_utils/trainer_ds.py", line 670, in forward_step
retval = model(**batch)
^^^^^^^^^^^^^^
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/models/sense_voice/model.py", line 697, in forward
encoder_out, encoder_out_lens = self.encode(speech, speech_lengths, text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/models/sense_voice/model.py", line 759, in encode
[[self.textnorm_int_dict[int(style)]] for style in text[:, 3]]
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/funasr/models/sense_voice/model.py", line 759, in <listcomp>
[[self.textnorm_int_dict[int(style)]] for style in text[:, 3]]
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 6491
E0501 18:32:25.772000 140269714682944 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 77200) of binary: /root/miniconda3/envs/funasr/bin/python3.11
Traceback (most recent call last):
File "/root/miniconda3/envs/funasr/bin/torchrun", line 8, in <module>
sys.exit(main())
^^^^^^
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/run.py", line 879, in main
run(args)
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/funasr/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
../../../funasr/bin/train_ds.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-05-01_18:32:25
host : localhost.localdomain
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 77200)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
^C
有大佬知道如何解决吗?
您好,现在解决了吗?
您好,现在解决了吗?
没
大佬,现在怎么解决了吗