tikhonlavrev
tikhonlavrev
> > 1. Run training script with torchrun instead of srun > > (e.g., torchrun --standalone --nnodes 1 --nproc_per_node 2 train/~~~~) > > 2. Exchange SLURM_LOCALID, SLURM_PROCID, SLURM_NNODES to LOCAL_RANK,...
Adding `sys.path.append(os.getcwd())` seems work to me. So the module import should looks like ``` sys.path.append(os.getcwd()) from gdf import GDF, EpsilonTarget, CosineSchedule ```
Great thanks the insight, I've temporary modify your train.py at line 969 from: ``` ) lr_scheduler = get_lr_scheduler( args, optimizer, accelerator, logger, use_deepspeed_scheduler=False ) if hasattr(lr_scheduler, "num_update_steps_per_epoch"): lr_scheduler.num_update_steps_per_epoch = num_update_steps_per_epoch...
Okay I'm back. After the train progress at the validation phase, I got this error: ``` Keyword arguments {'safety_checker': None} are not expected by FluxPipeline and will be ignored. Loaded...
I'm sorry I leave my machine, so I can't sure how long does it take. I can approximate between 30 minutes-1 hours, this is the interval when I leave my...
It's actually possible to have 4,76GB in bf16 but without Text encoder included on safetensors file