The pre-training loss can not converge.
Dear Author,
I tried to pre-train the model on book-corpus data only, whose training config is exactly the same as yours. But I found that the training loss is around 0.76 after 3 epochs. It seems that the model could not converge. And the reconstruction images are bad. Could you tell me what's the problem?
Thanks, Chawdoe
Hi Chawdoe,
could you perhaps paste as much information as possible on the config you're using and your environment setup (torch version, python version, libraries, etc.)? Otherwise it's very hard for me to diagnose what might be wrong. Also have you tried to take a small subset of the data (e.g. 1000 examples) and see if the model is able to overfit it?