Is the vocabulary of BERT the same as the vocabulary of BERT-joint?

Open wmmxk opened this issue 6 years ago • 1 comments

As mentioned in the technical report, special markup tokens, such as "[Paragraph=N]" and "[Table=N]", were introduced. I think there are no such tokens in the vocabulary for BERT model. So the embedding table in the first layer of the transformer encoder seems different between BERT and BERT-joint. But the BERT-joint used a pre-trained BERT model. I had a hard time understanding this part. Any ideas?

Dec 30 '19 20:12 wmmxk

BERT-joint uses different vocab as original BERT. Vocab list used for bert-joint has replaced [unused=xx] tokens from bert original vocab keeping the total size as it is (original vocab.txt file includes ~1k [unused=xx]). bert-joint is initialized from pre-trained bert model, but as long as the total number of vocabs stays same, taking pre-trained bert as initialization works.

Jan 06 '20 02:01 Seohyeong