[Experiment] Transfer Control to Other SD1.X Models
Discussed in https://github.com/lllyasviel/ControlNet/discussions/12
Originally posted by lllyasviel February 11, 2023 This is guideline to transfer the ControlNet to any other community model in a relatively “correct” way.
This post is prepared for SD experts. You need to have some understandings about the neural network architecture of Stable Diffusion to perform this experiment.
Let us say we want to use OpenPose to control Anything V3, then the overall method is
AnythingV3_control_openpose = AnythingV3 + SD15_control_openpose – SD15
You can download necessary files from
AnythingV3: https://huggingface.co/Linaqruf/anything-v3.0 SD1.5: https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main ControlNet: https://huggingface.co/lllyasviel/ControlNet/tree/main/models
Important things to keep in mind:
- Replacing the base model in control net MAY work but is WRONG. This is because control net may be trained with some SD layers unlocked. See the ending part of “SD_locked” in the official training guideline. You need to compute the offset even in the base diffusion model.
- The difference of CLIP text encoder must be considered. In many anime models, because of that well-known reason, a dominant majority of models need “clip_skip=2” and a 3x longer Token length. Note that this is also influencing the SoftMax averaging because the length is different.
- In some applications like human pose, your input image should not be anime images. It should be real person photos because that image is only read by the OpenPose human pose detector. That image will not be visible for SD/ControlNet. Also, OpenPose is bad at processing anime images.
I have done all these preparations for you.
You may open the "tool_transfer_control.py" and then edit some file paths
path_sd15 = './models/v1-5-pruned.ckpt'
path_sd15_with_control = './models/control_sd15_openpose.pth'
path_input = './models/anything-v3-full.safetensors'
path_output = './models/control_any3_openpose.pth'
You can define the output filename with "path_output". You need to make sure that all other 3 filenames are correct and exist. Then
python tool_transfer_control.py
Then you will get the file
models/control_any3_openpose.pth
Then, you need to hack the gradio codes to read your new models, and hack the CLIP encoder with "clip_skip=2" and 3x token length.
Taking openpose as an example, you can hack "gradio_pose2image.py" in this way
from share import *
from cldm.hack import hack_everything
hack_everything(clip_skip=2)
import config
import cv2
import einops
import gradio as gr
import numpy as np
import torch
from pytorch_lightning import seed_everything
from annotator.util import resize_image, HWC3
from annotator.openpose import apply_openpose
from cldm.model import create_model, load_state_dict
from ldm.models.diffusion.ddim import DDIMSampler
model = create_model('./models/cldm_v15.yaml').cpu()
model.load_state_dict(load_state_dict('./models/control_any3_openpose.pth', location='cpu'))
model = model.cuda()
ddim_sampler = DDIMSampler(model)
def process ...
Then, results will be like:
("1girl")

("1girl, masterpiece, garden")

And other controls like Canny edge:
("1girl, garden, flowers, sunshine, masterpiece, best quality, ultra-detailed, illustration, disheveled hair")

This is a re-post. Please go to the dicussion for dicussions.