What are the effects of overfitting for downstream tasks?
I was trying to adapt the sentence-transformers/multi-qa-mpnet-base-dot-v1 model to the financial domain using SEC data using GPL.
I trained the model with the following hyperparams:
{
"learning_rate": 0.00002,
"num_examples_eval": 1000,
"num_examples_train" : 20000,
"num_epochs": 15
}
My loss curves were as follows:
Seems like the model itself is overfitting, but the performance of the trained model is not up to the mark even if I had used early stopping. I trained one for 3 epochs and the unadapted models perform better than the trained ones. And I was wondering if I could have some insights on why this is, I don't really know where to ask this question. If there is some other place where this question is suitable, please let me know and I will take it there, Especially because this is more of a theoretical question than something tied to this library.
I am relatively new to training models, so please let me know if I am making any obvious mistakes here (or if any other information is required).