Low GPU-Util and Save Promblem During Full training on Qwen-Image-Edit-2509
Hi,
Thanks for your excellect framework!
I’m fine-tuning Qwen-Image-Edit-2509 on a self-built image editing dataset, but the training speed is extremely slow.
Environment: single node with 8 GPUs
Dataset size: ~1M samples
Estimated training time: ~1300 hours (very slow)
As shown in the attached nvitop screenshot, GPU utilization is very low across all devices.
Is this expected for this model, or could there be an inefficiency in data loading / communication? Any suggestions or optimization tips to improve multi-GPU utilization would be greatly appreciated.
Additionally, When performing full fine-tuning, saving model weights is extremely slow — only one GPU shows high utilization during the saving stage, and the process often triggers NCCL timeout errors, blocking the training.
Is there a recommended way to handle or accelerate checkpoint saving in multi-GPU fine-tuning?