ms-swift steps如何计算的

我使用了以下脚本进行训练，数据集大小约为33000条数据，per_device_batch_size=16，gradient_accumenlation_steps=32，epochs=3，4张GPU。 nproc_per_node=4

NPROC_PER_NODE=$nproc_per_node
CUDA_VISIBLE_DEVICES=0,1,2,3
swift pt
--model Qwen/Qwen2.5-7B
--train_type full
--dataset $CUSTOM_DATASET
--torch_dtype bfloat16
--num_train_epochs 3
--per_device_train_batch_size 16
--per_device_eval_batch_size 1
--learning_rate 1e-5
--gradient_accumulation_steps $(expr 128 / $nproc_per_node)
--packing true
--eval_steps 10
--save_steps 50
--save_total_limit 2
--logging_steps 5
--deepspeed zero3
--max_length 8192
--warmup_ratio 0.05
--save_only_model true
--output_dir XXXXX

如果正常计算应该是33000*3/16/32/4=48，但是实际进度条显示是193steps。请问ms_swift如何自动计算step数的？

Apr 22 '25 08:04 toufunao

加了packing

Apr 23 '25 06:04 Jintao-Huang

或者你看看 NPROC_PER_NODE是否设置正常

Apr 23 '25 06:04 Jintao-Huang

加了packing 谢谢指正，刚刚重新看了一下启动脚本，并没有使用packing，使用了sequence_parallel进行训练。验证NPROC_PER_NODE也是正常的，world_size在log中也是4。但是step数和手动计算的值仍然有误差

nproc_per_node=4 NPROC_PER_NODE=$nproc_per_node CUDA_VISIBLE_DEVICES=0,1,2,3 swift pt --model Qwen/Qwen2.5-7B --train_type full --dataset $CUSTOM_DATASET --torch_dtype bfloat16 --num_train_epochs 3 --sequence_parallel 4 --per_device_train_batch_size 16 --per_device_eval_batch_size 1 --learning_rate 1e-5 --gradient_accumulation_steps $(expr 128 / $nproc_per_node) --eval_steps 10 --save_steps 50 --save_total_limit 2 --logging_steps 5 --deepspeed zero3 --max_length 8192 --warmup_ratio 0.05 --save_only_model true --output_dir XXXXX

Apr 23 '25 07:04 toufunao

This issue has been inactive for over 3 months and will be automatically closed in 7 days. If this issue is still relevant, please reply to this message.

Jul 23 '25 00:07 github-actions[bot]

This issue has been automatically closed due to inactivity. If needed, it can be reopened.

Aug 03 '25 00:08 github-actions[bot]