transformers Cannot use local files for AutoModelForVision2Seq when using BLIP3

System Info

transformers version: 4.43.3
Platform: Linux-5.4.0-26-generic-x86_64-with-glibc2.27
Python version: 3.10.4
Huggingface_hub version: 0.24.3
Safetensors version: 0.4.3
Accelerate version: 0.33.0
Accelerate config: - compute_environment: LOCAL_MACHINE - distributed_type: NO - mixed_precision: fp16 - use_cpu: False - debug: False - num_processes: 1 - machine_rank: 0 - num_machines: 1 - gpu_ids: 0,2,3,4,5,6 - rdzv_backend: static - same_network: True - main_training_function: main - enable_cpu_affinity: False - downcast_bf16: no - tpu_use_cluster: False - tpu_use_sudo: False - tpu_env: []
PyTorch version (GPU?): 2.0.1+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA GeForce RTX 3090

Who can help?

@zucchini-nlp @ArthurZucker @amyeroberts

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

I am using the latest Salesforce/xgen-mm-phi3-mini-base-r-v1 for image captioning. To reproduce,

Download all the file (the entire directory of Salesforce/xgen-mm-phi3-mini-base-r-v1) from huggingface link.
Run the following code to load LOCAL model (I just put it under /root)

from transformers import AutoModelForVision2Seq, AutoTokenizer, AutoImageProcessor
import json
import PIL
import IPython.display as display
import torch

model_name_or_path = "/root/xgen-mm-phi3-mini-base-r-v1.5/"
model = AutoModelForVision2Seq.from_pretrained(model_name_or_path, trust_remote_code=True)

Expected behavior

It just start to launch a long download process and I don't see why the local files are used.

Aug 22 '24 03:08 dibbla

Hey @dibbla !

I just tried your script and it didn't trigger any downloading process, the only process was loading checkpoint shards. Can you make sure that the given directory exists and has the model checkpoints in it

Aug 22 '24 04:08 zucchini-nlp

Hey @dibbla !

I just tried your script and it didn't trigger any downloading process, the only process was loading checkpoint shards. Can you make sure that the given directory exists and has the model checkpoints in it

Thanks for the fast reply @zucchini-nlp

I just confirmed that the directory does contain the model files.

Code:

from transformers import AutoModelForVision2Seq, AutoTokenizer, AutoImageProcessor
import json
import PIL
import IPython.display as display
import torch
import os

model_name_or_path = "/root/xgen-mm-phi3-mini-base-r-v1.5/"
print(os.listdir(model_name_or_path))
model = AutoModelForVision2Seq.from_pretrained(model_name_or_path, trust_remote_code=True)

What I observe (it starts download):

['model-00002-of-00004.safetensors', 'added_tokens.json', 'image_processing_blip_3.py', 'README.md', 'preprocessor_config.json', 'icl_examples', 'special_tokens_map.json', 'model-00001-of-00004.safetensors', 'model-00003-of-00004.safetensors', 'generation_config.json', 'demo.ipynb', 'tokenizer.json', '.huggingface', '.gitattributes', 'config.json', 'model.safetensors.index.json', 'modeling_xgenmm.py', 'test_samples', 'tokenizer_config.json', 'model-00004-of-00004.safetensors']
model.safetensors:  14%|██████████████▌                                                                                           | 482M/3.51G [00:09<48:00, 1.05MB/s]

Is there any differences between my directory and yours?

Aug 22 '24 05:08 dibbla

Hmm, yes, I have the same files under the directory. It is weird, can you verify that the loading goes through this path and ensures that the passed model_id is a path?

https://github.com/huggingface/transformers/blob/3bb7b05229466ce820f76dadb250a848e7eb22e7/src/transformers/modeling_utils.py#L3430-L3436

Aug 22 '24 07:08 zucchini-nlp

Hmm, yes, I have the same files under the directory. It is weird, can you verify that the loading goes through this path and ensures that the passed model_id is a path?

https://github.com/huggingface/transformers/blob/3bb7b05229466ce820f76dadb250a848e7eb22e7/src/transformers/modeling_utils.py#L3430-L3436

Hi @zucchini-nlp

Problem solved, though not knowing why 🤔

I downloaded the main branch of transformers and install it from source, and things work fine. I also noticed that this issue affects not only blip3 but other models using transformers' autopipelines. It may have to do with the environment setting given I am using a VM.

Closing this issue and I might comment if I have time to further investigate

Aug 26 '24 07:08 dibbla