diffusers text_to_image and image to images

I trained finetuned a stable diffusion model with my owndataset. The training scipt is like https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py

The text_to_image pipleline output images with styles out of my expections. But if I use image to image pipeline, and feed it with text and image as the input, it will be more likely to output images which meet my expectations. I think this may becaused by the pre-trained dataset since it's different from my own dataset and my dataset is relatively small, less than 23000 samples.

Is there any way to train the model so that it prefer to output images similiar to the training set? Should I give the net some guide info? @patrickvonplaten @patil-suraj @pcuenca

Jan 11 '23 02:01 StrugglingForBetter

I don't really know here sadly. Maybe someone from the community has seen such a use case? Otherwise, maybe Discord might help: https://discord.gg/G7tWnz98XR

Jan 13 '23 13:01 patrickvonplaten

If I understand correctly, what you want is for the model to be able to generate images in the style of the training dataset after fine-tuning. I think you should try training for longer, for example, for the pokemon example listed in the readme, we need to do around ~100 epochs to get good generations.

Jan 16 '23 14:01 patil-suraj

If I understand correctly, what you want is for the model to be able to generate images in the style of the training dataset after fine-tuning. I think you should try training for longer, for example, for the pokemon example listed in the readme, we need to do around ~100 epochs to get good generations.

Hi, patil-suraj, your understanding is almost correct. I trained 10+ epoches, I got a relatively good model A. After that, I tried to trained more epoches after that, and got a model B. But I found model B got even worse testing results. So I guess, even I trained more epoches, it won't get good geenrations.

Jan 17 '23 01:01 StrugglingForBetter

I think you'll have to experiment with hyperparameters, like different learning rates, number of epochs, number of training images etc. Also, it would be nice if you could post the command that you are using so we could see if there's any issue with the script.

Jan 17 '23 08:01 patil-suraj

I think you'll have to experiment with hyperparameters, like different learning rates, number of epochs, number of training images etc. Also, it would be nice if you could post the command that you are using so we could see if there's any issue with the script.

Sorry for late reply. The following is my command:

export MODEL_NAME="path/to/pretrained/model" export data_path="/path/to/training/data"

accelerate launch train.py
--pretrained_model_name_or_path=$MODEL_NAME
--dataset_name=$data_path
--use_ema
--resolution=512 --center_crop --random_flip
--train_batch_size=1
--gradient_accumulation_steps=4
--gradient_checkpointing
--max_train_steps=150000
--learning_rate=1e-05
--max_grad_norm=1
--mixed_precision="fp16"
--lr_scheduler="constant" --lr_warmup_steps=0
--ckpt_dir="ckpts"
--ckpt_steps=500
--enable_xformers_memory_efficient_attention
--output_dir="model"

Jan 20 '23 13:01 StrugglingForBetter

Hard to say what the issue is, the command looks good to me. As I said above, think you'll need to play a bit with different hyperparameters.

Jan 23 '23 10:01 patil-suraj

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Feb 16 '23 15:02 github-actions[bot]