ControlNet How is the latent image (64x64) used in tutorial?

Appreciate the author's great great work.

My question is about the latent image (compressed from 512x512 original image). In author's paper, it mentioned that ControlNet trained a new lite CNN to compress the original image(like depth map / seg map / edge map?) to latent image. Is that crucial when we training on a customed dataset? Because I don't see any related information about training the lite CNN in the example shown in tutorial. Should we retrain a new CNN to obtain latent image on new dataset?

Feb 17 '23 16:02 YAOYI626

This is the model used for encoding new control information into latent space. https://github.com/lllyasviel/ControlNet/blob/main/cldm/cldm.py#L47-L304

And this control information in the bigger ControlLDM. https://github.com/lllyasviel/ControlNet/blob/main/cldm/cldm.py#L311 https://github.com/lllyasviel/ControlNet/blob/main/cldm/cldm.py#L333 https://github.com/lllyasviel/ControlNet/blob/main/cldm/cldm.py#L333

This lite CNN is trained during training.

Feb 20 '23 01:02 xiankgx

thanks @xiankgx for your detailed answer! That's super helpful for me.

Apr 03 '23 09:04 YAOYI626