DiffSynth-Studio about batch size in Wan I2V training

Hi, when training lora I2V wan model, currently the default batch size is 1. When I try to adjust it to larger size in： dataloader = torch.utils.data.DataLoader( dataset, shuffle=True, batch_size=1, num_workers=args.dataloader_num_workers ) ，there is an error: [rank1]: RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 4 but got size 1 for tensor number 1 in the list. I have not find the place to adjust it in the parse. Could you help me know how to increase the batch size? Thank you.

Mar 30 '25 23:03 Steven-Xiong

@Steven-Xiong Hi, How long does generating a 5-second video with Wan2.1-I2V-14B-480P on your GPU setup take?

Mar 31 '25 03:03 ZhouQianang

I also encountered this problem.

Mar 31 '25 11:03 njzxj

I guess currently only the case where the batchsize is 1 has been implemented, such as in this code: https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/wanvideo/train_wan_t2v.py#L254 only the index 0 of prompt embeding was used

Apr 01 '25 11:04 Feynman1999

@Feynman1999 Understood. So does anyone figure out how to solve this? just remove all the [0] in the training_step function?

Apr 01 '25 15:04 Steven-Xiong

@Artiprocher Hi, could you have a look at this issue? Thanks

Apr 02 '25 21:04 Steven-Xiong

@Steven-Xiong Due to the large size of this model and the limited memory of most GPUs, we do not plan to support a batch size greater than 1 for now. However, you can achieve similar functionality using gradient accumulation.

Apr 03 '25 02:04 Artiprocher

@Artiprocher Does it mean that current batch_size=1 per GPU is a good fit for 80G GPU? I find for each gpu, the memory consumption is around 66G

Apr 04 '25 15:04 Steven-Xiong

@Steven-Xiong Yes

Apr 08 '25 11:04 Artiprocher

I would like to kindly follow up another question, does data_processing also only allow for batch_size=1? Currently I find data processing process seems does not support multi-gpu, so the process is very slow, and I would like to know if there is anything I can do to accelerate, either allow multi-gpu training or increase batch size. @Artiprocher

May 07 '25 20:05 Steven-Xiong

I would like to kindly follow up another question, does data_processing also only allow for batch_size=1? Currently I find data processing process seems does not support multi-gpu, so the process is very slow, and I would like to know if there is anything I can do to accelerate, either allow multi-gpu training or increase batch size. @Artiprocher

I have the same confusion. If the batch size is too large, the number of tensors generated will be incorrect. Do you have a better solution? Thank you.

May 16 '25 03:05 zfw-cv