ByteMLPerf 【Issue Help】 chatglm2-6b has some cases dismatch with golden

https://github.com/bytedance/ByteMLPerf/blob/main/byte_infer_perf/llm_perf/workloads/chatglm2-torch-fp16-6b.json

We run on A100-40G to get output logits with the below configuration：

{
    "model": "chatglm2-torch-fp16-6b",
    "test_accuracy": true,
    "test_perf": true,
    "min_new_tokens": 128,
    "max_new_tokens": 256,
    "tp_sizes": [1, 2],
    "batch_sizes":[1, 2, 4, 8],
    "input_tokens": [1024, 2048],
    "dataset": "llm_perf/datasets/merged_52_test.csv",
    "perf_time": 180
}

It seems that some dimensions do not match the golden values. one case of 52 cases:

id,question,A,B,C,D
0,"对于以下结构定义，++p->str中的++加在____
struct{
int len;
char*str;
}*P;",指针 p 上,指针 str 上,str 指的内容上,语法错误

Jun 03 '24 09:06 DeepTecher

to be comfirmed.

Jun 06 '24 08:06 suisiyuan

previous golden values didn't contain eos_token_id, and might stop generating if generated tokens num exceeds 512. current golden values will contain eos_token_id, and will still stop generating if generated tokens num exceeds 512.

Jun 24 '24 12:06 suisiyuan