ml-aim
ml-aim copied to clipboard
AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 results in the LLaVA model achieving a loss of 0,grad_norm of NAN.
When using AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 leads to the LLaVA model reaching a loss of 0 after 5000 steps. The original paper kept the encoder frozen. Why is it not recommended to unfreeze it for training? If I decide to unfreeze it, What should I do?