flan-t5-xl and flan-t5-xxl model deployment on Sagemaker fails on deploying from HuggingFace Hub
System Info
Code to replicate from model hub - https://huggingface.co/google/flan-t5-large/tree/main -> Deploy -> Amazon SageMaker endpoint -> AWS
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'google/flan-t5-xl',
'HF_TASK':'text2text-generation'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.17.0',
pytorch_version='1.10.2',
py_version='py38',
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.m5.xlarge' # ec2 instance type
)
predictor.predict({
'inputs': "The answer to the universe is"
})
The endpoint invocation fails with below error -
2023-03-10 19:15:14,508 [INFO ] W-google__flan-t5-xl-5-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index', 'flax_model.msgpack'] found in directory /.sagemaker/mms/models/google__flan-t5-xl or from_tfandfrom_flax set to False.
This is possibly because if you look at files under "Files and Versions"( Link) the model has been split up into multiple (pytorch_model-00001-of-00002.bin files because of the size) and the out of the box solution is looking for one pytorch_model.bin file and failing
Who can help?
No response
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Provided code in the issue description
Expected behavior
Should work out of the box on SageMaker deployment
@philschmid could you please help here? I've gone though your workaround here
You need to install a more recent version of Transformers, 4.17.0 won't support sharded checkpoints.
@rags1357 you can check out this blog post: Deploy FLAN-T5 XXL on Amazon SageMaker
Thank you @sgugger and @philschmid , will try it with the updated Transformers version
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.