BERT4doc-Classification icon indicating copy to clipboard operation
BERT4doc-Classification copied to clipboard

For Layer-wise Decreasing Layer Rate

Open zhaogangthu opened this issue 5 years ago • 1 comments

Thanks for your hard work! I have two questions. First, for Layer-wise Decreasing Layer Rate, did you use a warm-up or polynomial_decay simultaneous?,and it means that warm-up rate and Layer-wise Decreasing Layer Rate are used simultaneous? Second, for large bert, how did you set the Learning rate and Decay factor which the paper didn't give?

zhaogangthu avatar Dec 17 '20 01:12 zhaogangthu

sorry for a late answer

  1. we also use a warm-up for layer-wise decreasing layer rate, which means, they are used simultaneously
  2. we do not conduct experiments about learning rates on large bert, but we empirically observe that bert-large has similar results comparing to bert-base.

xuyige avatar Feb 19 '21 19:02 xuyige