BERT4doc-Classification Questions about discriminative_fine

In Section 5.4.3 " We find that assign a lower learn- ing rate to the lower layer is effective to fine-tuning BERT, and an appropriate setting is ξ=0.95 and lr=2.0e-5." Compared to the code in https://github.com/xuyige/BERT4doc-Classification/blob/master/codes/fine-tuning/run_classifier.py#L812 Seem that you divide the bert layer into 3 part (4 layers for one part) and set different learning rate for each part. Some questions about it:

How could the decay factor 0.95 match the number 2.6 in code ?
And the last classify layer seem not be contained , no need to set lr for it ?

Jun 12 '20 07:06 wlhgtc

Thank you for your issue!

The number 2.6 was set for the beginning experiments, after that, we use run_classifier_discriminative.py for discriminative fine-tuning.
The link to run_classifier_discriminative.py is https://github.com/xuyige/BERT4doc-Classification/blob/master/codes/fine-tuning/run_classifier_discriminative.py
The classifier layer is contained in run_classifier_discriminative.py.

Jun 25 '20 14:06 xuyige

Thanks for your reply, I will try it!

Jun 28 '20 04:06 wlhgtc

Questions about discriminative_fine_tuning