[BUG] multi-node inference initialization fails when trying not to use replace_with_kernel_inject
Describe the bug
I was following HuggingFace's script for deepspeed inference and found it doesn't work when kernel_inject is False
To Reproduce
Script: https://github.com/huggingface/transformers-bloom-inference/blob/main/bloom-inference-scripts/bloom-ds-inference.py
Change kernel_inject=True (line 121) to kernel_inject=False (when kernel_inject=True it works)
Run: deepspeed --num_gpus=4 bloom-ds-inference.py --name="bigscience/bloom-7b1"
I am hoping to test some model with auto tensor parallelism, which deepspeed hasn't supported kernel yet, but it suffers from the same loading issue below. Can you please advice how to address this?
Expected behavior
Looks like a bug in load_checkpoint during the initialization of deepspeed engine:
File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/__init__.py", line 324, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)engine = InferenceEngine(model, config=ds_inference_config)engine = InferenceEngine(model, config=ds_inference_config)
File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 155, in __init__
File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 155, in __init__
File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 155, in __init__
engine = InferenceEngine(model, config=ds_inference_config)
File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 155, in __init__
self._load_checkpoint(config.checkpoint) self._load_checkpoint(config.checkpoint)
self._load_checkpoint(config.checkpoint) File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 450, in _load_checkpoint
File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 450, in _load_checkpoint
File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 450, in _load_checkpoint
self._load_checkpoint(config.checkpoint)
File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 450, in _load_checkpoint
load_path, checkpoint, quantize_config = sd_loader.load(self._config.tensor_parallel.tp_size,load_path, checkpoint, quantize_config = sd_loader.load(self._config.tensor_parallel.tp_size,
load_path, checkpoint, quantize_config = sd_loader.load(self._config.tensor_parallel.tp_size,AttributeErrorAttributeError
: load_path, checkpoint, quantize_config = sd_loader.load(self._config.tensor_parallel.tp_size,'dict' object has no attribute 'load': AttributeError