AttributeError: 'DebertaV2Tokenizer' object has no attribute 'get_vocab_size'

Open pn12 opened this issue 3 years ago • 0 comments

Am trying to create a tokenizer from Model 'microsoft/deberta-v2-xlarge'

Initially I got no-offset-mapping error while setting return_offset_mapping = True.

Later, I created tokenizer using pretokenizerfast :

tokenizer = AutoTokenizer.from_pretrained(CFG.model)
tokenizer.save_pretrained(OUTPUT_DIR+'tokenizer/')
CFG.tokenizer = tokenizer
fast_tokenizer = PreTrainedTokenizerFast(tokenizer_object=tokenizer)

And I get the below error -

/usr/local/lib/python3.6/dist-packages/IPython/core/formatters.py in __call__(self, obj)
    700                 type_pprinters=self.type_printers,
    701                 deferred_pprinters=self.deferred_printers)
--> 702             printer.pretty(obj)
    703             printer.flush()
    704             return stream.getvalue()

/usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in pretty(self, obj)
    392                         if cls is not object \
    393                                 and callable(cls.__dict__.get('__repr__')):
--> 394                             return _repr_pprint(obj, self, cycle)
    395 
    396             return _default_pprint(obj, self, cycle)

/usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    698     """A pprint that just redirects to the normal repr function."""
    699     # Find newlines and replace them with p.break_()
--> 700     output = repr(obj)
    701     lines = output.splitlines()
    702     with p.group():

/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils_base.py in __repr__(self)
   1517     def __repr__(self) -> str:
   1518         return (
-> 1519             f"{'PreTrainedTokenizerFast' if self.is_fast else 'PreTrainedTokenizer'}(name_or_path='{self.name_or_path}', "
   1520             f"vocab_size={self.vocab_size}, model_max_len={self.model_max_length}, is_fast={self.is_fast}, "
   1521             f"padding_side='{self.padding_side}', truncation_side='{self.truncation_side}', special_tokens={self.special_tokens_map_extended})"

/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils_fast.py in vocab_size(self)
    142         `int`: Size of the base vocabulary (without the added tokens).
    143         """
--> 144         return self._tokenizer.get_vocab_size(with_added_tokens=False)
    145 
    146     def get_vocab(self) -> Dict[str, int]:

AttributeError: 'DebertaV2Tokenizer' object has no attribute 'get_vocab_size'

Can anyone suggest how to correct it?

Thanks.

Apr 12 '22 14:04 pn12