steps如何计算的
我使用了以下脚本进行训练,数据集大小约为33000条数据,per_device_batch_size=16,gradient_accumenlation_steps=32,epochs=3,4张GPU。 nproc_per_node=4
NPROC_PER_NODE=$nproc_per_node
CUDA_VISIBLE_DEVICES=0,1,2,3
swift pt
--model Qwen/Qwen2.5-7B
--train_type full
--dataset $CUSTOM_DATASET
--torch_dtype bfloat16
--num_train_epochs 3
--per_device_train_batch_size 16
--per_device_eval_batch_size 1
--learning_rate 1e-5
--gradient_accumulation_steps $(expr 128 / $nproc_per_node)
--packing true
--eval_steps 10
--save_steps 50
--save_total_limit 2
--logging_steps 5
--deepspeed zero3
--max_length 8192
--warmup_ratio 0.05
--save_only_model true
--output_dir XXXXX
如果正常计算应该是33000*3/16/32/4=48,但是实际进度条显示是193steps。请问ms_swift如何自动计算step数的?
加了packing
或者你看看 NPROC_PER_NODE是否设置正常
加了packing 谢谢指正,刚刚重新看了一下启动脚本,并没有使用packing,使用了sequence_parallel进行训练。 验证NPROC_PER_NODE也是正常的,world_size在log中也是4。但是step数和手动计算的值仍然有误差
nproc_per_node=4 NPROC_PER_NODE=$nproc_per_node CUDA_VISIBLE_DEVICES=0,1,2,3 swift pt --model Qwen/Qwen2.5-7B --train_type full --dataset $CUSTOM_DATASET --torch_dtype bfloat16 --num_train_epochs 3 --sequence_parallel 4 --per_device_train_batch_size 16 --per_device_eval_batch_size 1 --learning_rate 1e-5 --gradient_accumulation_steps $(expr 128 / $nproc_per_node) --eval_steps 10 --save_steps 50 --save_total_limit 2 --logging_steps 5 --deepspeed zero3 --max_length 8192 --warmup_ratio 0.05 --save_only_model true --output_dir XXXXX
This issue has been inactive for over 3 months and will be automatically closed in 7 days. If this issue is still relevant, please reply to this message.
This issue has been automatically closed due to inactivity. If needed, it can be reopened.