SenseVoice icon indicating copy to clipboard operation
SenseVoice copied to clipboard

sensevoice 微调 加载数据出错

Open yangppde opened this issue 1 year ago • 7 comments

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

image 数据安装文档使用sensevoice2jsonl 生成的数据,微调时加载数据出现 data is empty!

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

Code

What have you tried?

What's your environment?

  • OS (e.g., Linux):
  • FunASR Version (e.g., 1.0.0):
  • ModelScope Version (e.g., 1.11.0):
  • PyTorch Version (e.g., 2.0.0):
  • How you installed funasr (pip, source):
  • Python version:
  • GPU (e.g., V100M32)
  • CUDA/cuDNN version (e.g., cuda11.7):
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
  • Any other relevant information:

yangppde avatar Aug 19 '24 04:08 yangppde

请问你解决了吗

wntg avatar Oct 26 '24 05:10 wntg

Please tell me how to solve it, I also encountered the same problem @LauraGPT

sadlay avatar Nov 28 '24 12:11 sadlay

你好,请问你解决了吗

klevin-ken avatar Dec 30 '24 12:12 klevin-ken

请问有没有人解决了这个问题呀

jollyfish-cjy avatar Mar 17 '25 08:03 jollyfish-cjy

mark,遇见了同样的问题

3202275278 avatar Apr 23 '25 08:04 3202275278

我使用sensevoice2jsonl生成的数据与自己设计的mock数据都会出现"data is empty"的问题,查看配置文件后发现是忽略了finetune.sh中的data_conf中的dataset_conf.batch_type="token"与dataset_conf.batch_size属性,误认为是按照样本分batch,将batch调大一些,问题就解决了。

3202275278 avatar Apr 23 '25 09:04 3202275278

我使用sensevoice2jsonl生成的数据与自己设计的mock数据都会出现"data is empty"的问题,查看配置文件后发现是忽略了finetune.sh中的data_conf中的dataset_conf.batch_type="token"与dataset_conf.batch_size属性,误认为是按照样本分batch,将batch调大一些,问题就解决了。

有用!

qy1026 avatar Jul 30 '25 10:07 qy1026