DeBERTa
DeBERTa copied to clipboard
AttributeError: 'DebertaV2Tokenizer' object has no attribute 'get_vocab_size'
Am trying to create a tokenizer from Model 'microsoft/deberta-v2-xlarge'
Initially I got no-offset-mapping error while setting return_offset_mapping = True.
Later, I created tokenizer using pretokenizerfast :
tokenizer = AutoTokenizer.from_pretrained(CFG.model)
tokenizer.save_pretrained(OUTPUT_DIR+'tokenizer/')
CFG.tokenizer = tokenizer
fast_tokenizer = PreTrainedTokenizerFast(tokenizer_object=tokenizer)
And I get the below error -
/usr/local/lib/python3.6/dist-packages/IPython/core/formatters.py in __call__(self, obj)
700 type_pprinters=self.type_printers,
701 deferred_pprinters=self.deferred_printers)
--> 702 printer.pretty(obj)
703 printer.flush()
704 return stream.getvalue()
/usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in pretty(self, obj)
392 if cls is not object \
393 and callable(cls.__dict__.get('__repr__')):
--> 394 return _repr_pprint(obj, self, cycle)
395
396 return _default_pprint(obj, self, cycle)
/usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
698 """A pprint that just redirects to the normal repr function."""
699 # Find newlines and replace them with p.break_()
--> 700 output = repr(obj)
701 lines = output.splitlines()
702 with p.group():
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils_base.py in __repr__(self)
1517 def __repr__(self) -> str:
1518 return (
-> 1519 f"{'PreTrainedTokenizerFast' if self.is_fast else 'PreTrainedTokenizer'}(name_or_path='{self.name_or_path}', "
1520 f"vocab_size={self.vocab_size}, model_max_len={self.model_max_length}, is_fast={self.is_fast}, "
1521 f"padding_side='{self.padding_side}', truncation_side='{self.truncation_side}', special_tokens={self.special_tokens_map_extended})"
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils_fast.py in vocab_size(self)
142 `int`: Size of the base vocabulary (without the added tokens).
143 """
--> 144 return self._tokenizer.get_vocab_size(with_added_tokens=False)
145
146 def get_vocab(self) -> Dict[str, int]:
AttributeError: 'DebertaV2Tokenizer' object has no attribute 'get_vocab_size'
Can anyone suggest how to correct it?
Thanks.