KeyError when loading AutoModelForTokenClassification
System Info
-
transformersversion: 4.29.2 - Platform: macOS-14.2.1-arm64-arm-64bit
- Python version: 3.10.14
- Huggingface_hub version: 0.23.0
- Safetensors version: 0.4.3
- PyTorch version (GPU?): 2.3.0 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help?
No response
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
-
Setup a python virtual environment with the command
python -m venv .venv -
Enter the virtual environment with
source .venv/bin/activate -
Install the following python dependencies:
transformers==4.29.2 accelerate==0.19.0 datasets pysbd wandb h5py nltk spacy ersatz iso-639 scikit-learn==1.2.2 numpy==1.23.5 pydantic torchinfo conllu pandarallel cohere replicate onnx onnxruntime torchinfo mosestokenizer cached_property tqdm skops pandas protobuf==3.20
-
Run the lines of python code
from transformers import AutoModelForTokenClassification
model = AutoModelForTokenClassification.from_pretrained("segment-any-text/sat-1l-sm")
Expected behavior
Expected the model to load without any issue. However, I get the following error instead:
Traceback (most recent call last):
File "/Users/pim.jv/Documents/Code/wtpsplit/test_hf.py", line 4, in <module>
model = AutoModelForTokenClassification.from_pretrained("segment-any-text/sat-1l-sm", force_download=False)
File "/opt/homebrew/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 444, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/opt/homebrew/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 940, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/opt/homebrew/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 655, in __getitem__
raise KeyError(key)
KeyError: 'xlm-token'
Hey, there is no config assignment for this model type. It is needed to add related type and config into mapping. @zucchini-nlp is it ok for you? I could try to fix it
Seems like you are trying to load XLMForTokenClassification. The model type should be modified in the config.json file on the hub. XLM models can be loaded via xlm-roberta model type. If the model belongs to you you can update it yourself, otherwise open a PR on model page.
https://github.com/huggingface/transformers/blob/ac5a0556f14dec503b064d5802da1092e0b558ea/src/transformers/models/auto/modeling_auto.py#L1126-L1127
A workaround is to load directly with the correct model class, as XLMRobertaForTokenClassification.from_pretrained(model_id)
@zucchini-nlp Just one thing is weird to me and because of that, I suppose, model loading is failed, is that in config.json of this model "base_model": "xlm-roberta-base". I am not sure that it is correct, should not be a base_model is different one?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.