DeepFaceLab icon indicating copy to clipboard operation
DeepFaceLab copied to clipboard

Freezing layers

Open Jerry-Master opened this issue 2 years ago • 2 comments

Problem

The current code does not make any real difference between trainable and non-trainable weights. Every layer has a unique get_weights method and later on the trainable weights are defined simply in terms of that method.

Possible solution

Adding an extra line of code to each model should suffice

self.trainable_weights = [x for x in self.trainable_weights if x.trainable]

Suggestion

It would be interesting to add support for freezing layers. This could be useful is for instance the decoder_src is infratrained with respect to the decoder_dst. Freezing the encoder and only training the decoder_src can help in that situation. Or if you have a very powerful pretrained model and only wants to fine-tune the last layers making use of the learned features of the other model. I propose this thread as a starting point of discussion on how to design the interface for freezing layers.

Jerry-Master avatar Feb 07 '23 15:02 Jerry-Master

reusing model - freezing encoder/decoder letting interAB train, then unfreeze ?

zabique avatar Feb 08 '23 22:02 zabique

What you mention would not work, if a layer is unfrozen, all the layers after it should also be. The reason for that is neural network layers build on top of each other. If you change Inter, you change the latent space the decoder is used to. Therefore causing the decoder to not work properly.

Freezing and unfreezing can be used to make the decoder better adapt to an inter, or the inter and decoder adapt to the encoder. Encoder and inter codify information of faces into vectors and decode recovers the face from that. However, those vectors can, in theory, describe any face. So, with a powerful enough encoder, just tuning the decoder may suffice to generate any desired face. Is the same idea behind LLM. You pretrain a big model with loads of data and then using few little samples you finetune a head on top for a downstream task. In computer vision is more difficult, models do not generalise so well. But the concept can be useful in some cases.

Jerry-Master avatar Feb 09 '23 07:02 Jerry-Master