Cache_dir parameter not being passed down to transformers module
Hi Team,
I have found that the cache_dir parameter from the _from_pretrained method of GLiNER class is not being passed to the internal Transformers module for AutoTokenizer in gliner/model.py and for AutoConfig in gliner/modeling/encoder.py. This is causing the gliner module to fail in case of offline usage for models such as Numind/NuNerZero, which rely on different tokenizer.
To recreate this issue:
- Run, with any cache_dir other that ~/.cache/huggingface/hub
model = GLiNER.from_pretrained(
"numind/NuNerZero",
local_files_only=False,
cache_dir=cache_dir
)
- You'll notice that inspite of having the cache_dir set custom it'll download
~/.cache/huggingface/hub/models--microsoft--deberta-v3-large/, This is infact the tokenizer used by NuNerZero. - If you try to run the model in offline mode without
~/.cache/huggingface/hub/models--microsoft--deberta-v3-large/directory then it'll throw the following error:
import os
os.environ['HF_HUB_OFFLINE']='1'
from gliner import GLiNER
model = GLiNER.from_pretrained(
"numind/NuNerZero",
local_files_only=True,
cache_dir=cache_dir
)
$python main.py
Traceback (most recent call last):
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/transformers/utils/hub.py", line 424, in cached_files
hf_hub_download(
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 961, in hf_hub_download
return _hf_hub_download_to_cache_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1068, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1587, in _raise_on_head_call_error
raise LocalEntryNotFoundError(
huggingface_hub.errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/bit2244/nuner/main.py", line 46, in <module>
print(func('Hi was based off of USA in 24/12/1990', 'organisation,date,country','/home/bit2244/nuner'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bit2244/nuner/main.py", line 13, in func
model = GLiNER.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/huggingface_hub/hub_mixin.py", line 566, in from_pretrained
instance = cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/gliner/model.py", line 804, in _from_pretrained
gliner = cls(config, tokenizer=tokenizer, encoder_from_pretrained=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/gliner/model.py", line 53, in __init__
tokenizer = AutoTokenizer.from_pretrained(config.model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 966, in from_pretrained
config = AutoConfig.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 1114, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/transformers/configuration_utils.py", line 590, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/transformers/configuration_utils.py", line 649, in _get_config_dict
resolved_config_file = cached_file(
^^^^^^^^^^^^
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/transformers/utils/hub.py", line 266, in cached_file
file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bit2244/nuner/env/lib/python3.12/site-packages/transformers/utils/hub.py", line 491, in cached_files
raise OSError(
OSError: We couldn't connect to 'https://huggingface.co' to load the files, and couldn't find them in the cached files.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
Note: I've tried to manually passdown the cache_dir from the above call to all the way to transformer pkg calls, using the above callstack from traceback, and it works. So this would be helpful if it is changed in the actual library code as well.
Thanks
I can also take this up as an PR, if needed 😄
Sorry for the delay in reviewing your PR. This is a good job and important fixes. Thank you!