PKD-for-BERT-Model-Compression
PKD-for-BERT-Model-Compression copied to clipboard
Why do you set for KD.Full like this [fix_pooler=True]?
Hi,
Thank you for your interesting work! I just wondering why don`t you used the pooler for only KD.Full and if you use the pooler, did you initialize the pooler with BERT_teacher weight and bias?
Thank you, Sincerely,