FDM
FDM copied to clipboard
Error in distributed training (DDP)
when I run :
torchrun --standalone --nproc_per_node=4 train.py --outdir=training-output \
--data=datasets/ffhq-64x64.zip --cond=0 --arch=ddpmpp \
--batch=256 --cres=1,2,2,2 --lr=2e-4 --dropout=0.05 --augment=0.15 \
--precond=fdm_edm --warmup_ite=800 --fdm_multiplier=1
it appear:
untimeError: params[127] in this process with sizes [256, 256, 1, 1] appears not to match strides of the same param in process 0.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 91129 closing signal SIGTERM