[BUG] can not initialize DeepSpeed-Inference engine with deepspeed.init_inference()
Hello, I am new user of the DeepSpeed(DS) and I successfully trained checkpoints using DS. However, I met issue when trying to used the checkpoint for inference. I want to use the tutorial by this, however, I tried to give the folder of *.pt file or to the .pt file. I always get this error
Traceback (most recent call last):
File "deepspeed_infer2.py", line 28, in
This is my checkpoint.json:
{
"type": "DeepSpeed",
"version": 0.3,
"checkpoint_path": "./ds_models/global_step1/mp_rank_00_model_states.pt"
}
this is code i used to get the inference engine:
# Initialize the DeepSpeed-Inference engine
ds_engine = deepspeed.init_inference(model,
dtype=torch.half,
checkpoint="checkpoint.json",
replace_method='auto',
replace_with_kernel_inject=True)
I can use another approach to load the checkpoint:
#Initialize the DeepSpeed-Inference engine
model_engine, _, _, _ = deepspeed.initialize(
model=model,
model_parameters=model.parameters(),
config=ds_config
)
# load checkpoint
load_dir = '../results/ds_models/global_step226'
#load checkpoint
_, client_sd = model_engine.load_checkpoint(load_dir)
and use this new model_engine for inference. I am not sure what is the difference between two methods? and why first approach is not working?
@Jirigesi Thanks for using DeepSpeed! I believe the problem when using init_inference is that your checkpoint.json is missing the key checkpoints:
KeyError: 'checkpoints'
Try replacing checkpoint_path with checkpoints:
{
"type": "DeepSpeed",
"version": 0.3,
"checkpoints": ["./ds_models/global_step1/mp_rank_00_model_states.pt"]
}
Hello @Jirigesi,
Apologies for the delayed follow up to your issue. The inference tutorial is slightly out of date with the code. For checkpoint loading to work using a checkpoint.json as described in the tutorial, replace_with_kernel_inject must be False due to this check in the InferenceEngine:
https://github.com/microsoft/DeepSpeed/blob/58a4a4d4c19bda86d489ac171fa10f3ddb27c9d6/deepspeed/inference/engine.py#L95
This check was added in GH-2083 along with the Meta Tensors feature, which uses "meta tensors" to initialize the model, then loads the weights after module replacement.
The GH-2940 draft PR changes the InferenceEngine check in the code snippet above to more explicitly check for meta tensor usage, allowing checkpoints to be loaded as described in the tutorial. We're also looking to update the tutorial as well to reflect the current state of checkpoint loading.
Please let us know if you have any additional questions!
Thanks, Lev