BERT4doc-Classification
BERT4doc-Classification copied to clipboard
Questions about discriminative_fine_tuning
In Section 5.4.3 " We find that assign a lower learn- ing rate to the lower layer is effective to fine-tuning BERT, and an appropriate setting is ξ=0.95 and lr=2.0e-5." Compared to the code in https://github.com/xuyige/BERT4doc-Classification/blob/master/codes/fine-tuning/run_classifier.py#L812 Seem that you divide the bert layer into 3 part (4 layers for one part) and set different learning rate for each part. Some questions about it:
- How could the decay factor 0.95 match the number 2.6 in code ?
- And the last classify layer seem not be contained , no need to set lr for it ?
Thank you for your issue!
- The number 2.6 was set for the beginning experiments, after that, we use run_classifier_discriminative.py for discriminative fine-tuning.
- The link to run_classifier_discriminative.py is https://github.com/xuyige/BERT4doc-Classification/blob/master/codes/fine-tuning/run_classifier_discriminative.py
- The classifier layer is contained in run_classifier_discriminative.py.
Thanks for your reply, I will try it!