diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Parallel dataloader for unconditional diffusion example

Open Flova opened this issue 3 years ago • 1 comments

Is your feature request related to a problem? Please describe. The loading of the samples is sequential and without prefetching n the train_unconditional.py script. This results in an underutilization of the GPU if large images and/or "slow" IO are used.

Describe the solution you'd like Adding a parameter, which sets the num_workers parameter of the PyTorch dataloader, solves this issue by loading the samples in parallel using multiple workers.

Describe alternatives you've considered

  • Leave it as it is and accept a slower training
  • I am not that familiar with accelerate, but maybe we can get the number of workers from there instead of introducing a new parameter.

Additional context

  • Tested locally (1x 2080 ti + 32 core Threadripper) using our own dataset (https://github.com/bit-bots/TORSO_21_dataset). Resulted in 140% speedup as well as a higher GPU utilization in nvtop.

Flova avatar Oct 04 '22 09:10 Flova

cc @anton-l

patrickvonplaten avatar Oct 04 '22 13:10 patrickvonplaten

Any updates here @anton-l ?

patrickvonplaten avatar Oct 27 '22 08:10 patrickvonplaten

Thanks for the feedback @Flova! Added the parameter in https://github.com/huggingface/diffusers/pull/1027

anton-l avatar Oct 27 '22 13:10 anton-l