ms-swift icon indicating copy to clipboard operation
ms-swift copied to clipboard

微调了qwen2-audio-7b-instruct

Open farmer21cn opened this issue 1 year ago • 7 comments

          使用自己的数据微调了qwen2-audio-7b-instruct audio_tower层,微调数据是英语口语,训练数据样例如下:

{"query": "

微调命令如下:

NPROC_PER_NODE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
            --model_type qwen2-audio-7b-instruct \
            --model_cache_dir "/root/dev/qwen2_audio/Qwen2-Audio-7B-Instruct" \
            --sft_type full \
            --freeze_parameters 0.999 \
            --additional_trainable_parameters audio_tower \
            --dtype AUTO \
            --template_type AUTO \
            --output_dir "/root/dev/qwen2_audio/output" \
            --dataset "/root/dev/qwen2_audio/train_en.jsonl" \
            --dataset_test_ratio 0.01 \
            --num_train_epochs 3 \
            --max_length 1024 \
            --check_dataset_strategy warning \
            --gradient_checkpointing true \
            --batch_size 6 \
            --weight_decay 0.01 \
            --learning_rate 1e-5 \
            --gradient_accumulation_steps 32  \
            --max_grad_norm 0.5 \
            --warmup_ratio 0.03 \
            --eval_steps 100 \
            --save_step 1000 \
            --train_dataset_sample -1 \
            --save_total_limit 10 \
            --report_to tensorboard \
            --logging_steps 10 \
            --lazy_tokenize true

微调后,在自己的测试集上,wer从13.23%降为了9.37%,在新生成的模型的基础上使用同一批数据通过lora微调,微调命令如下:

            OMP_NUM_THREADS=4 NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
            --model_type qwen2-audio-7b-instruct \
            --model_cache_dir "/root/dev/qwen2_audio/output/qwen2-audio-7b-instruct/v0-20241127-170841/checkpoint-7344" \
            --output_dir "/root/dev/qwen2_audio/output_epochs_6_peft" \
            --dataset "/root/dev/qwen2_audio/train_en.jsonl" \
            --sft_type lora \
            --tuner_backend peft \
            --template_type AUTO \
            --dtype AUTO \
            --num_train_epochs 3 \
            --max_length 2048 \
            --check_dataset_strategy warning \
            --lora_rank 8 \
            --lora_alpha 32 \
            --lora_dropout_p 0.05 \
            --lora_target_modules DEFAULT \
            --gradient_checkpointing true \
            --batch_size 2 \
            --weight_decay 0.1 \
            --learning_rate 1e-4 \
            --gradient_accumulation_steps 16 \
            --max_grad_norm 0.5 \
            --warmup_ratio 0.03 \
            --eval_steps 100 \
            --save_steps 100 \
            --save_total_limit 500 \
            --logging_steps 10 \
            --use_flash_attn false \
            --lazy_tokenize true

微调后,在自己的测试集上,wer从9.37%升到了10.21%,那位大神帮忙看看出了什么问题? Uploading eval_loss.png…

Originally posted by @farmer21cn in https://github.com/modelscope/ms-swift/issues/1653#issuecomment-2530687501

farmer21cn avatar Dec 12 '24 01:12 farmer21cn

你知道为什么我使用给的例子模板构造数据后,训练后感觉模型并没有输入语音呢,打印了一个例子的输入是直接把语音地址打印出来了,并不是“Audio 1: <|audio_bos|><|AUDIO|><|audio_eos|>” [ {"conversations": [ {"from": "user", "value": "11111"}, {"from": "assistant", "value": "22222"} ]}, {"conversations": [ {"from": "user", "value": "aaaaa"}, {"from": "assistant", "value": "bbbbb"}, {"from": "user", "value": "ccccc"}, {"from": "assistant", "value": "ddddd"} ]}, {"conversations": [ {"from": "user", "value": "AAAAA"}, {"from": "assistant", "value": "BBBBB"}, {"from": "user", "value": "CCCCC"}, {"from": "assistant", "value": "DDDDD"} ]} ]

YuiGao avatar Apr 09 '25 03:04 YuiGao

<audio>audio_path</audio>这个在swift3是不生效的,请使用audios=['audio_path1', 'autio_path2']传入

Jintao-Huang avatar Apr 09 '25 03:04 Jintao-Huang

<audio>audio_path</audio>这个在swift3是不生效的,请使用audios=['audio_path1', 'autio_path2']传入

这个有对应的数据格式例子可以参考吗

YuiGao avatar Apr 17 '25 02:04 YuiGao

<audio>audio_path</audio>这个在swift3是不生效的,请使用audios=['audio_path1', 'autio_path2']传入

这个有对应的数据格式例子可以参考吗

swift3的数据格式必须是{"query",”response“,"audios":[]}这样吗

YuiGao avatar Apr 17 '25 06:04 YuiGao

查看 自定义数据集文档 有相关的介绍

Jintao-Huang avatar Apr 17 '25 06:04 Jintao-Huang

格式是这样的:

{"query": "<audio>what did this voice say", "response": "which is really quite long enough", "audios": ["/root/fyj20220812/sent/k12/cz200/CRK100/CRK00206/CRK0096268218.wav"]} {"query": "<audio>what did this voice say", "response": "daniel is my good friend he is always kind to others he is friendly and helpful", "audios": ["/root/fyj20220812/sent/k12/md100_5000_snt/2dc2c32679bbc5c9d55ccadecc34bc80_1_0_6240.wav"]} {"query": "<audio>what did this voice say", "response": "they are not fat", "audios": ["/root/fyj20220812/sent/k12/cz200/CRK100/CRK0023/CRK0074359396.wav"]}

farmer21cn avatar Apr 21 '25 00:04 farmer21cn