duguodong
duguodong
Then I see that reference answer is only prepare for 100~130, we do not neet to run gen_api_answer.py since the reference 30 is given. However, with the provided gpt-4-0125-preview.jsonl and...
I just tested the result of FuseChat-2.0 provided in your link, the result is (1st turn: 7.6125 2nd turn: 6.425 mean: 7.01875) in stead of (7.70 7.05 7.38) what you...
Thank you so much! I did make an error when setting the chat template, and the performance improved after correcting it.