diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Possible to use vit-B/32 instead of vit-L/14 for SD fine-tuning on Dreambooth?

Open stpg06 opened this issue 3 years ago • 3 comments

I am wondering if I can change the default clip model to run my training and if so, how?

stpg06 avatar Feb 13 '23 22:02 stpg06

@yiyixuxu could you take a look here? :-)

patrickvonplaten avatar Feb 14 '23 21:02 patrickvonplaten

Hi @stpg06:

If you want to experiment with a different text encoder, you could modify this part in the training script https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py#L605

text_encoder = text_encoder_cls.from_pretrained(...)

YiYi

yiyixuxu avatar Feb 15 '23 02:02 yiyixuxu

Just to add one more comment here, @stpg06 note that dreambooth fine-tunes an already trained checkpoint. If this already trained checkpoint has been trained with a vit-L/14 text encoder then you will probably get bad results when swapping out this text encoder with another one (vit-B/32) because the unet has not been trained on it.

Long story short, I don't think it makes much sense to swap text encoders for dreambooth, however for text-to-image training it could make a lot of sense :-)

patrickvonplaten avatar Feb 15 '23 10:02 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Mar 16 '23 15:03 github-actions[bot]