[BUG]The inference error is large

Open pangr opened this issue 2 years ago • 0 comments

Describe the bug The results generated based on DS inference and the results generated by the original model have certain errors, and the results generated by DS inference will be randomly generated later

To Reproduce Steps to reproduce the behavior:

tokenizer = LlamaTokenizer.from_pretrained(base_model)
model = LlamaForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.float16,
)
model = deepspeed.init_inference(
            model=model,    
            mp_size=1,       
            dtype=torch.float16, 
            replace_method="auto", 
            replace_with_kernel_inject=True,
            )
instruction = "xxxx"
inputs = "xxxx"
t1 = time.time()
prompt = generate_prompt(instruction, inputs)
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].to("cuda")
generation_config = GenerationConfig(
	temperature=0.1,
	top_p=0.75,
	top_k=40,
	num_beams=1,
)

with torch.no_grad():
	generation_output = model.generate(
		input_ids=input_ids,
		generation_config=generation_config,
		return_dict_in_generate=True,
		output_scores=True,
		max_new_tokens=1024,
	)

Expected behavior

ds_report output

Screenshots If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

OS: [e.g. Ubuntu 18.04]
GPU count and types [e.g. two machines with x2 V100s each]
(if applicable) what DeepSpeed-MII version are you using
(if applicable) Hugging Face Transformers/Accelerate/etc. versions 4.28
Python version 3.8
Any other relevant info about your setup

Docker context Are you using a specific docker image that you can share?

Additional context Add any other context about the problem here.

May 08 '23 12:05 pangr