BrushNet
BrushNet copied to clipboard
Errors when training with dataloader_num_workers > 0
error message
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
I have the same issue. Is there any guide or updates coming to distributed training?
Putting a line of torch.multiprocessing.set_start_method('spawn', force=True) at the beginning of the training script seems to be sufficient based on some brief tests. However, I still feel that, theoretically, there could be potential bugs.