DeepSpeed
DeepSpeed copied to clipboard
[BUG]The inference error is large
Describe the bug The results generated based on DS inference and the results generated by the original model have certain errors, and the results generated by DS inference will be randomly generated later
To Reproduce Steps to reproduce the behavior:
tokenizer = LlamaTokenizer.from_pretrained(base_model)
model = LlamaForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.float16,
)
model = deepspeed.init_inference(
model=model,
mp_size=1,
dtype=torch.float16,
replace_method="auto",
replace_with_kernel_inject=True,
)
instruction = "xxxx"
inputs = "xxxx"
t1 = time.time()
prompt = generate_prompt(instruction, inputs)
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].to("cuda")
generation_config = GenerationConfig(
temperature=0.1,
top_p=0.75,
top_k=40,
num_beams=1,
)
with torch.no_grad():
generation_output = model.generate(
input_ids=input_ids,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=1024,
)
Expected behavior

ds_report output

Screenshots If applicable, add screenshots to help explain your problem.
System info (please complete the following information):
- OS: [e.g. Ubuntu 18.04]
- GPU count and types [e.g. two machines with x2 V100s each]
- (if applicable) what DeepSpeed-MII version are you using
- (if applicable) Hugging Face Transformers/Accelerate/etc. versions 4.28
- Python version 3.8
- Any other relevant info about your setup
Docker context Are you using a specific docker image that you can share?
Additional context Add any other context about the problem here.