GLiNER icon indicating copy to clipboard operation
GLiNER copied to clipboard

Cache_dir parameter not being passed down to transformers module

Open bit2244 opened this issue 9 months ago • 2 comments

Hi Team, I have found that the cache_dir parameter from the _from_pretrained method of GLiNER class is not being passed to the internal Transformers module for AutoTokenizer in gliner/model.py and for AutoConfig in gliner/modeling/encoder.py. This is causing the gliner module to fail in case of offline usage for models such as Numind/NuNerZero, which rely on different tokenizer. To recreate this issue:

  1. Run, with any cache_dir other that ~/.cache/huggingface/hub
model = GLiNER.from_pretrained(
      "numind/NuNerZero", 
      local_files_only=False, 
      cache_dir=cache_dir
  )
  1. You'll notice that inspite of having the cache_dir set custom it'll download ~/.cache/huggingface/hub/models--microsoft--deberta-v3-large/, This is infact the tokenizer used by NuNerZero.
  2. If you try to run the model in offline mode without ~/.cache/huggingface/hub/models--microsoft--deberta-v3-large/ directory then it'll throw the following error:
import os
os.environ['HF_HUB_OFFLINE']='1'
from gliner import GLiNER
    model = GLiNER.from_pretrained(
        "numind/NuNerZero", 
        local_files_only=True, 
        cache_dir=cache_dir
    )
$python main.py
Traceback (most recent call last):
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/transformers/utils/hub.py", line 424, in cached_files
    hf_hub_download(
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 961, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1068, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1587, in _raise_on_head_call_error
    raise LocalEntryNotFoundError(
huggingface_hub.errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/bit2244/nuner/main.py", line 46, in <module>
    print(func('Hi was based off of USA in 24/12/1990', 'organisation,date,country','/home/bit2244/nuner'))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bit2244/nuner/main.py", line 13, in func
    model = GLiNER.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/huggingface_hub/hub_mixin.py", line 566, in from_pretrained
    instance = cls._from_pretrained(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/gliner/model.py", line 804, in _from_pretrained
    gliner = cls(config, tokenizer=tokenizer, encoder_from_pretrained=False)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/gliner/model.py", line 53, in __init__
    tokenizer = AutoTokenizer.from_pretrained(config.model_name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 966, in from_pretrained
    config = AutoConfig.from_pretrained(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 1114, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/transformers/configuration_utils.py", line 590, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/transformers/configuration_utils.py", line 649, in _get_config_dict
    resolved_config_file = cached_file(
                           ^^^^^^^^^^^^
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/transformers/utils/hub.py", line 266, in cached_file
    file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bit2244/nuner/env/lib/python3.12/site-packages/transformers/utils/hub.py", line 491, in cached_files
    raise OSError(
OSError: We couldn't connect to 'https://huggingface.co' to load the files, and couldn't find them in the cached files.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Note: I've tried to manually passdown the cache_dir from the above call to all the way to transformer pkg calls, using the above callstack from traceback, and it works. So this would be helpful if it is changed in the actual library code as well.

Thanks

bit2244 avatar Apr 22 '25 15:04 bit2244

I can also take this up as an PR, if needed 😄

bit2244 avatar Apr 22 '25 15:04 bit2244

Sorry for the delay in reviewing your PR. This is a good job and important fixes. Thank you!

Ingvarstep avatar May 20 '25 17:05 Ingvarstep