Why only use pre-trained BERT Tokenizer but not the entire pre-trained BERT model(including the pre-trained encoder)?

Open KevinGoodman opened this issue 3 years ago • 0 comments

I am not sure why the implementation only use the tokenizer from hugging face but did not use the pre-trained encoder. I mean why need to retrain the BERT-like transformer? Is the text embedding from the original BERT model not good enough? And why not use fine-tune instead of training from scratch?

Aug 07 '22 05:08 KevinGoodman