ControlNet icon indicating copy to clipboard operation
ControlNet copied to clipboard

How is the latent image (64x64) used in tutorial?

Open YAOYI626 opened this issue 2 years ago • 1 comments

Appreciate the author's great great work.

My question is about the latent image (compressed from 512x512 original image). In author's paper, it mentioned that ControlNet trained a new lite CNN to compress the original image(like depth map / seg map / edge map?) to latent image. Is that crucial when we training on a customed dataset? Because I don't see any related information about training the lite CNN in the example shown in tutorial. Should we retrain a new CNN to obtain latent image on new dataset?

YAOYI626 avatar Feb 17 '23 16:02 YAOYI626

This is the model used for encoding new control information into latent space. https://github.com/lllyasviel/ControlNet/blob/main/cldm/cldm.py#L47-L304

And this control information in the bigger ControlLDM. https://github.com/lllyasviel/ControlNet/blob/main/cldm/cldm.py#L311 https://github.com/lllyasviel/ControlNet/blob/main/cldm/cldm.py#L333 https://github.com/lllyasviel/ControlNet/blob/main/cldm/cldm.py#L333

This lite CNN is trained during training.

xiankgx avatar Feb 20 '23 01:02 xiankgx

thanks @xiankgx for your detailed answer! That's super helpful for me.

YAOYI626 avatar Apr 03 '23 09:04 YAOYI626