DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[BUG]The inference error is large

Open pangr opened this issue 2 years ago • 0 comments

Describe the bug The results generated based on DS inference and the results generated by the original model have certain errors, and the results generated by DS inference will be randomly generated later

To Reproduce Steps to reproduce the behavior:

tokenizer = LlamaTokenizer.from_pretrained(base_model)
model = LlamaForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.float16,
)
model = deepspeed.init_inference(
            model=model,    
            mp_size=1,       
            dtype=torch.float16, 
            replace_method="auto", 
            replace_with_kernel_inject=True,
            )
instruction = "xxxx"
inputs = "xxxx"
t1 = time.time()
prompt = generate_prompt(instruction, inputs)
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].to("cuda")
generation_config = GenerationConfig(
	temperature=0.1,
	top_p=0.75,
	top_k=40,
	num_beams=1,
)

with torch.no_grad():
	generation_output = model.generate(
		input_ids=input_ids,
		generation_config=generation_config,
		return_dict_in_generate=True,
		output_scores=True,
		max_new_tokens=1024,
	)

Expected behavior image

ds_report output image

Screenshots If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

  • OS: [e.g. Ubuntu 18.04]
  • GPU count and types [e.g. two machines with x2 V100s each]
  • (if applicable) what DeepSpeed-MII version are you using
  • (if applicable) Hugging Face Transformers/Accelerate/etc. versions 4.28
  • Python version 3.8
  • Any other relevant info about your setup

Docker context Are you using a specific docker image that you can share?

Additional context Add any other context about the problem here.

pangr avatar May 08 '23 12:05 pangr