How to change paramaters when build own RE model?
We'd like to build a specific RE model to meet our own requirement. The performance is quite good. However, we identified some overfitting problems and we'd like to solve the problem by adjusting some hyper-paramaters like the number of hidden layer, dropout or learning rate. Is it possible to change these settings? Or it's not possible because our model is trained from a pre-trained model? Thanks!
Hi @transpurs, even if you start from our pre-trained model, you should be able to address overfitting to your own RE dataset by careful finetuning. I recommend first trying different learning rates / batch sizes & controlling for the number of finetuning gradient updates you make using early-stopping and/or training for different number of epochs and monitoring performance on a dev set.
Warmup is also quite important; I'd try setting warmup steps to be between 10-20% of your total number of gradient updates.
Thanks @kyleclo ! Finetuning with different learning rates and early stopping really helps. Another question, is it possible to adjust the "dropout" parameter under the fine-tune mode? I did not see such parameter in the configuration json file.
@transpurs glad to hear that it's helping! Unfortunately you'll have to modify the allennlp configuration file to do this. As you can see here in the model definition: https://github.com/allenai/scibert/blob/7598219a8d80b9c2fe1323a141e4a9e40ec044cb/scibert/models/bert_text_classifier.py#L28
dropout is a configurable parameter. one should be able to add a new key into the allennlp configuration dictionary and set dropout to be something else. another option is simply changing the default dropout value in the model file itself (since the configuration file leaves it unspecified, the model is currently using the default value)
Thanks for your rapid reply. I am wondering where to change the default dropout value in the model file. Could you please give me some hints? Another thought for solving the overfitting problem is to simplify the model. Is that possible to decrease the number of hidden layers or units?