DeBERTa icon indicating copy to clipboard operation
DeBERTa copied to clipboard

This model for MLM is waste of time, why did you even made it if it cannot be used?

Open Oxi84 opened this issue 3 years ago • 5 comments

I do not get why would you make a model even worse than a unigram model. And i read it is one of the best in glue task, but I do not see how because it predicts: The capital of France is plunge.

 from transformers import pipeline
 unmasker = pipeline('fill-mask', model='deberta-base')
 the_out = unmasker("The capital of France is [MASK].")
 print("the_out",the_out)

As you can see the deberta results is completely wrong, there is some big error in porting it to transformers.

the_out [{'score': 0.001861382625065744, 'token': 18929, 'token_str': 'ABC', 'sequence': 'The capital of France isABC.'}, {'score': 0.0012871784856542945, 'token': 15804, 'token_str': ' plunge', 'sequence': 'The capital of France is plunge.'}, {'score': 0.001228992477990687, 'token': 47366, 'token_str': 'amaru', 'sequence': 'The capital of France isamaru.'}, {'score': 0.0010126306442543864, 'token': 46703, 'token_str': 'bians', 'sequence': 'The capital of France isbians.'}, {'score': 0.0008897537481971085, 'token': 43107, 'token_str': 'insured', 'sequence': 'The capital of France isinsured.'}]

Oxi84 avatar Apr 02 '22 12:04 Oxi84

In my opinion, Deberta MLM is used with EMD but transformers pipeline not use EMD code for MASK token prediction. So you cannot use the result produced by transformers fill-mask pipeline.

thanhlt998 avatar Apr 05 '22 18:04 thanhlt998

Might be some mistakes? Even using huggingface basic MLM pipline continue pretrain on deberta-v3-large works. Tested on kaggle nbme dataset.

chenghuige avatar Apr 06 '22 10:04 chenghuige

If you use huggingface basic MLM pipline to continue pretrain on deberta-v3-large, the pretrained encoder weight is used to learn along with the prediction head layer weight which is newly initialized. So it works if you finetune with huggingface basic MLM pipline.

thanhlt998 avatar Apr 06 '22 14:04 thanhlt998

@chenweizhu

Hi, may I ask what is your setting for building the deberta-v3-large? I always got the same acc for about 0.04 for that model but fine with deberta-v3-base and deberta-base. I used !python /content/drive/MyDrive/NBME/run_mlm.py --model_name_or_path microsoft/deberta-v3-large --train_file /content/drive/MyDrive/NBME/tapt_val.txt --per_device_train_batch_size 6 --per_device_eval_batch_size 6 --max_seq_length 951 --do_train --do_eval --fp16 --save_total_limit 5 --save_step=5000 --learning_rate 2e-4 --line_by_line --overwrite_output_dir --output_dir tmp/test-mlm

CopyNinja1999 avatar Apr 18 '22 16:04 CopyNinja1999

DeBERTa-v3 was trained with the replaced-token-detection objective and has never learned the MLM objective. So it's absolutely to be expected that it does not perform well on MLM, because that's not the purpose of v3. Read the paper before you complain: https://arxiv.org/abs/2111.09543

MoritzLaurer avatar Jul 29 '22 10:07 MoritzLaurer

Our code for pre-training V3 updated.

BigBird01 avatar Mar 19 '23 06:03 BigBird01