custom-diffusion Not working

I use this command with 100 reg images in ./data/1girl-reg and 56 instance images in ./data/Power

accelerate launch src/diffuser_training.py --pretrained_model_name_or_path 'C:\Users\jhon\StableTuner\models\Anything-V3' --instance_data_dir ./data/Power --class_data_dir ./data/1girl-reg --output_dir ./logs/power --with_prior_preservation --prior_loss_weight 1.0 --instance_prompt "illustration of a <new1> 1girl" --class_prompt "1girl" --resolution 512 --revision fp16 --mixed_precision fp16 --gradient_checkpointing --gradient_accumulation_steps 1 --use_8bit_adam --train_batch_size 1 --learning_rate 5e-6 --lr_warmup_steps 0 --max_train_steps 250 --scale_lr --modifier_token "<new1>"

after training I sample like this python src/sample_diffuser.py --delta_ckpt logs/power/delta.bin --ckpt "C:\Users\jhon\StableTuner\models\Anything-V3" --prompt "illustration of <new1> 1girl"

I also tried

python src/sample_diffuser.py --delta_ckpt logs/power/delta.bin --ckpt "C:\Users\jhon\StableTuner\models\Anything-V3" --prompt "<new1> 1girl"

the generated images do not look anything like my instance character below are some of the the instance character images that I used for training

explorer_FRprcjSFmJ

below are some of the regularization images that I used for training explorer_bMvWoV5Iz9

and this is the sampled image

As you can see the images look nothing like the instance character not sure what I am doing wrong

Jan 08 '23 17:01 TingTingin

Fixed it the problem was I was getting this error before

attempting to unscale fp16 gradients

if fixed by changing this

                    accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
                optimizer.step()
                lr_scheduler.step()
                optimizer.zero_grad()

to

                #accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
            #optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()

this apparently disabled some critical part of the process the actual fix was editing

"C:\Users\{your username}\anaconda3\envs\ST\Lib\site-packages\torch\cuda\amp\grad_scaler.py"

and changing

        with torch.no_grad():
            for group in optimizer.param_groups:
                for param in group["params"]:
                    if param.grad is None:
                        continue

to

        with torch.no_grad():
            for group in optimizer.param_groups:
                for param in group["params"]:
                    if param.grad is None:
                        continue
                    allow_fp16 = True

apparently theres an issue when using mixed precision and you need to explicitly enable this not sure of a better solution to add to the main train script's code as opposed to editing torch's files directly

Jan 08 '23 18:01 TingTingin

Also as a side note since it seems that the gradient clipping is to focus the training.

Is it possible to use this method to finetune the entire model with this repo only asking as I cant finetune on a 8gb gpu with any other method and it would be interesting if this could technically fine tune the entire model

Jan 08 '23 18:01 TingTingin

Hi, Thanks a lot for pointing out the error with mixed precision training. I will look into it more.

Regarding enabling full fine-tune in the same code, it should be possible by adding another type in the --freeze_model flag, and enabling all params to have gradients in the create_custom_diffusion function. Also, in case of full fine-tuning calling save_progress during training and load_model during inference is not required. Let me know if you need more details. I will see if I can update the code to enable this as well.

Thanks.

Jan 09 '23 05:01 nupurkmr9