BrushNet icon indicating copy to clipboard operation
BrushNet copied to clipboard

Errors when training with dataloader_num_workers > 0

Open huangjun12 opened this issue 1 year ago • 2 comments

error message

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

huangjun12 avatar May 23 '24 09:05 huangjun12

I have the same issue. Is there any guide or updates coming to distributed training?

jayhxmo avatar Sep 19 '24 05:09 jayhxmo

Putting a line of torch.multiprocessing.set_start_method('spawn', force=True) at the beginning of the training script seems to be sufficient based on some brief tests. However, I still feel that, theoretically, there could be potential bugs.

santisy avatar May 20 '25 03:05 santisy